2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01281
|View full text |Cite
|
Sign up to set email alerts
|

Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention

Abstract: We present Vision-based Navigation with Languagebased Assistance (VNLA), a grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments. The task emulates a real-world scenario in that (a) the requester may not know how to navigate to the target objects and thus makes requests by only specifying high-level endgoals, and (b) the agent is capable of sensing when it is lost and querying an advisor, who is more qualified at the tas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
57
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 85 publications
(57 citation statements)
references
References 37 publications
0
57
0
Order By: Relevance
“…Moreover, we introduce a self-supervised imitation learning method for exploration in order to explicitly address the generalization issue, which is a problem not well-studied in prior work. Concurrent to our work, [44,23,27,28] studies the VLN tasks from various aspects, and [31] introduces a variant of the VLN task to find objects by requesting language assistance when needed. Note that we are the first to propose to explore unseen environments for the VLN task.…”
Section: Related Workmentioning
confidence: 97%
“…Moreover, we introduce a self-supervised imitation learning method for exploration in order to explicitly address the generalization issue, which is a problem not well-studied in prior work. Concurrent to our work, [44,23,27,28] studies the VLN tasks from various aspects, and [31] introduces a variant of the VLN task to find objects by requesting language assistance when needed. Note that we are the first to propose to explore unseen environments for the VLN task.…”
Section: Related Workmentioning
confidence: 97%
“…Historically, semantic parsing was used to map natural language instructions to visual navigation in simulation environments (Chen and Mooney, 2011;MacMahon et al, 2006). Modern approaches use neural architectures to map natural language to the (simulated) world and execute actions (Paxton et al, 2019;Chen et al, 2018;Nguyen et al, 2018;Blukis et al, 2018;Fried et al, 2018;Mei et al, 2016). In visual question answering (VQA) (Antol et al, 2015;Hudson and Manning, 2019) and visual commonsense reasoning (VCR) (Zellers et al, 2019), input images are accompanied with natural language questions.…”
Section: Related Workmentioning
confidence: 99%
“…Visual-and-language navigation (VLN) [87,88,[118][119][120][121] is a multimodal task that has become increasingly popular in recent years. The idea behind VLN is to combine several active domains (i.e., natural language, vision, and action) to enable robots (intelligent agents) to navigate easily in unstructured environments.…”
Section: Vision-and-language Navigationmentioning
confidence: 99%