2021
DOI: 10.48550/arxiv.2101.05181
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Memory-Augmented Reinforcement Learning for Image-Goal Navigation

Lina Mezghani,
Sainbayar Sukhbaatar,
Thibaut Lavril
et al.

Abstract: In this work, we address the problem of image-goal navigation in the context of visually-realistic 3D environments. This task involves navigating to a location indicated by a target image in a previously unseen environment. Earlier attempts, including RL-based and SLAM-based approaches, have either shown poor generalization performance, or are heavily-reliant on pose/depth sensors. We present a novel method that leverages a cross-episode memory to learn to navigate. We first train a state-embedding network in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 20 publications
0
9
0
Order By: Relevance
“…With increased data and steps RL baselines unsurprisingly increase in performance, however we find with 5X more data and 10x more compute RL baselines still are outperformed by NRNS. The low performance of behavioral cloning and RL methods for image-goal navigation is unsurprising [4,15]. This demonstrates the difficulty of learning rewards on low level actions instead of value learning on possible exploration directions, exacerbating the difficulty of exploration in image-goal navigation.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…With increased data and steps RL baselines unsurprisingly increase in performance, however we find with 5X more data and 10x more compute RL baselines still are outperformed by NRNS. The low performance of behavioral cloning and RL methods for image-goal navigation is unsurprising [4,15]. This demonstrates the difficulty of learning rewards on low level actions instead of value learning on possible exploration directions, exacerbating the difficulty of exploration in image-goal navigation.…”
Section: Resultsmentioning
confidence: 99%
“…Navigation tasks largely fall into two main categories [1], ones in which a goal location is known [11,12,13] and limited exploration is required, and ones in which the goal location is not known and efficient exploration is necessary. In the second category, tasks range from finding the location of specific objects [5], rooms [14], or images [15], to the task of exploration itself [2]. The majority of current work [12,15,16,3] leverages simulators [7] and extensive interaction to learn end-to-end models for these tasks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Transformer Memory. Transformers [71] have been shown to do very well on long-horizon embodied tasks like navigation and exploration [11,13,17,24,46,47,57]. The performance gains arise from a transformer's ability to effectively leverage past experiences of the agent [11,24,46,47,57] and also to do cross-modal reasoning [13,17].…”
Section: Related Workmentioning
confidence: 99%
“…Transformers [71] have been shown to do very well on long-horizon embodied tasks like navigation and exploration [11,13,17,24,46,47,57]. The performance gains arise from a transformer's ability to effectively leverage past experiences of the agent [11,24,46,47,57] and also to do cross-modal reasoning [13,17]. Different from these methods, our idea is to use a transformer as a memory model for capturing long-range acoustic correlations for audio-visual separation.…”
Section: Related Workmentioning
confidence: 99%