2020
DOI: 10.1007/978-3-030-58542-6_19
|View full text |Cite
|
Sign up to set email alerts
|

Active Visual Information Gathering for Vision-Language Navigation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
40
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 57 publications
(48 citation statements)
references
References 23 publications
1
40
0
Order By: Relevance
“…As a crucial step towards building intelligent robots, autonomous navigation has been long studied in robotics community. Recently vision-based navigation [16] attained growing attention in computer vision community and was explored in many other forms, such as point-goal based or object-oriented (i.e., reaching a specific position or finding an object) [97,16,14,22], naturallanguage-guided [3,86], audio-assisted navigation [17], and from indoor environments [3] to street scenes [18]. Prominent early methods rely on a pre-given/-computed global map [8,44] for path planing, while later ones refer to simultaneous localization and mapping (SLAM) techniques [84,31] that reconstruct map on the fly.…”
Section: Related Workmentioning
confidence: 99%
“…As a crucial step towards building intelligent robots, autonomous navigation has been long studied in robotics community. Recently vision-based navigation [16] attained growing attention in computer vision community and was explored in many other forms, such as point-goal based or object-oriented (i.e., reaching a specific position or finding an object) [97,16,14,22], naturallanguage-guided [3,86], audio-assisted navigation [17], and from indoor environments [3] to street scenes [18]. Prominent early methods rely on a pre-given/-computed global map [8,44] for path planing, while later ones refer to simultaneous localization and mapping (SLAM) techniques [84,31] that reconstruct map on the fly.…”
Section: Related Workmentioning
confidence: 99%
“…For further boosting learning, some methods try to mine extra supervisory signals from synthesized samples [13,45,14] or auxiliary tasks [48,25,33,55]. Hence, for more intelligent path planning, the abilities of self-correction [28,34] and active exploration [47] are addressed. Some other recent studies also address environment-agnostic representation learning [49], fine-grained instruction grounding [23,40,22], web imagetext paired data based self-pretraining [35,18], or perform VLN in continuous environments [29].…”
Section: Related Workmentioning
confidence: 99%
“…Since Anderson et al [3] extended prior efforts [9,37] in instruction based navigation into photo-realistic simulated scenes [6], vision-language navigation (VLN) has recently attracted increasing attention in computer vision community. Towards the goal of enabling an agent to execute navigation instructions in 3D environments, current representative VLN methods made great advances in: i) developing more powerful learning paradigms [50,48]; ii) exploring extra supervision signals from synthesized data [13,45,14] or auxiliary tasks [48,25,33,55]; iii) designing more efficient multi-modal embedding schemes [24,40,49]; and iv) making more intelligent path planning [28,34,47]. * Corresponding author: Wenguan Wang.…”
Section: Introductionmentioning
confidence: 99%
“…We also analyze the training time and memory consumption at different batch sizes and caption sequence lengths. In addition to image captioning, it should be possible to attain similar benefits in other visual-language tasks such as visual question answering, dialog and vision-language navigation [35][36][37][38][39].…”
Section: Image Captioningmentioning
confidence: 99%