2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00162
|View full text |Cite
|
Sign up to set email alerts
|

Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 20 publications
0
10
0
Order By: Relevance
“…An intelligent agent asks for help when uncertain about the next action (Nguyen et al, 2021b). Action probabilities or a separately trained model (Chi et al, 2020;Zhu et al, 2021e;Nguyen et al, 2021a) can be leveraged to decide whether to ask for help. Using natural language to converse with the oracle covers a wider problem scope than sending a signal.…”
Section: Asking For Helpmentioning
confidence: 99%
“…An intelligent agent asks for help when uncertain about the next action (Nguyen et al, 2021b). Action probabilities or a separately trained model (Chi et al, 2020;Zhu et al, 2021e;Nguyen et al, 2021a) can be leveraged to decide whether to ask for help. Using natural language to converse with the oracle covers a wider problem scope than sending a signal.…”
Section: Asking For Helpmentioning
confidence: 99%
“…Almost all instruction following dialogue tasks need to consider both contextual information and actions as well as the state of the world (Suhr and Artzi, 2018;Lachmy et al, 2021), which remains a key challenge for instruction following dialogue tasks. In particular, the Vision-and-Dialog Navigation (VDN) task Roman et al, 2020;Zhu et al, 2021) where the question-answering dialogue and visual contexts are leveraged to facilitate navigation, has attracted increasing research attention. Other tasks, such as moving blocks tasks (Misra et al, 2017) and object finding tasks (Janner et al, 2018), also require the modelling of both contextual information in natural language as well as the world state representation to be solved.…”
Section: Related Work and Backgroundmentioning
confidence: 99%
“…The problem of instruction following for navigation has drawn significant attention in a wide range of domains. These include Google Street View Panoramas [11], simulated environments for quadcopters [5], multilingual settings [33], interactive vision-dialogue setups [60], real world scenes [3], and realistic simulations of indoor scenes [4]. More relevant to our work is the literature on the Vision-and-Language Navigation (VLN) task initially defined in [4] on navigation graphs (R2R) in Matterport3D [8] dataset, and then converted for continuous environments in [32] (VLN-CE).…”
Section: Related Workmentioning
confidence: 99%