2016
DOI: 10.1609/aaai.v30i1.10364
|View full text |Cite
|
Sign up to set email alerts
|

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

Abstract: We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) translates natural language instructions to action sequences based upon a representation of the observable world state. We introduce a multi-level aligner that empowers our model to focus on sentence "regions" salient to the current world state by using multiple abstra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
42
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 79 publications
(42 citation statements)
references
References 18 publications
0
42
0
Order By: Relevance
“…Vision & Language Navigation (VLN) tasks agents with taking in language instructions and a visual observation to produce an action, such as turning or moving forward, to receive a new visual observation. VLN benchmarks have evolved from the use of symbolic environment representations (MacMahon, Stankiewicz, and Kuipers 2006;Chen and Mooney 2011;Mei, Bansal, and Walter 2016) to photorealistic indoor (Anderson et al 2018) and outdoor environments (Chen et al 2019), as well as the prediction of continuous control (Blukis et al 2018). TEACh goes beyond navigation to object interactions for task completion, and beyond single instructions to dialogue.…”
Section: Related Workmentioning
confidence: 99%
“…Vision & Language Navigation (VLN) tasks agents with taking in language instructions and a visual observation to produce an action, such as turning or moving forward, to receive a new visual observation. VLN benchmarks have evolved from the use of symbolic environment representations (MacMahon, Stankiewicz, and Kuipers 2006;Chen and Mooney 2011;Mei, Bansal, and Walter 2016) to photorealistic indoor (Anderson et al 2018) and outdoor environments (Chen et al 2019), as well as the prediction of continuous control (Blukis et al 2018). TEACh goes beyond navigation to object interactions for task completion, and beyond single instructions to dialogue.…”
Section: Related Workmentioning
confidence: 99%
“…An attention mechanism (Fig. 1c) has proven to be particularly effective for various related tasks in machine translation, image caption synthesis, and language understanding (Mnih et al 2014;Bahdanau, Cho, and Bengio 2015;Xu et al 2015;Mei, Bansal, and Walter 2016a).…”
Section: Attention In Rnn-seq2seq Modelsmentioning
confidence: 99%
“…The original attention model introduced by Bahdanau, Cho, and Bengio (2015) uses the hidden units h 0:t−1 as the token representations r 0:t−1 . Recent work (Mei, Bansal, and Walter 2016a; has demonstrated that performance can be improved by using multiple abstractions of the input, e.g., r i = (E wi , h i ) , which is what we use in this work.…”
Section: Attention In Rnn-lmmentioning
confidence: 99%
See 1 more Smart Citation
“…Languages, be they natural or formal, afford these desirable properties Gopnik and Meltzoff [1987]. Based on this insight, many papers have tried to leverage the abilities of language in RL to enable communication and improve generalisation and sample efficiency Andreas et al [2017], Mei et al [2016], Goyal et al [2019], Xu et al [2022]. The domain can be subdivided into language-conditioned RL (LC-RL), in which language conditions the formulation of the problemAnderson et al [2018], Goyal et al [2019], and language-assisted RL, where language helps the agent to learn Hu et al [2019], Colas et al [2020], Akakzia et al [2020], Colas et al [2022].…”
Section: Introductionmentioning
confidence: 99%