2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00297
|View full text |Cite
|
Sign up to set email alerts
|

Situational Fusion of Visual Representation for Visual Navigation

Abstract: A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities. For example, to "go to the nearest chair", the agent might need to identify a chair in a living room using semantics, follow along a hallway using vanishing point cues, and avoid obstacles using depth. Therefore, utilizing the appropriate visual perception abilities based on a situational understanding of the visual environment can empower these navigation models in unseen visua… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 59 publications
(28 citation statements)
references
References 40 publications
0
28
0
Order By: Relevance
“…Our empirical results and analysis have shown several directions to pursue in the future. First, we need to develop more advanced component technologies integral to task learning, e.g., more advanced navigation modules through either more effective structures (Hong et al, 2020) or richer perceptions (Shen et al, 2019) to solve navigation bottleneck. We need to develop better representations and more robust and adaptive learning algorithms to support self-monitoring and backtracking.…”
Section: Discussionmentioning
confidence: 99%
“…Our empirical results and analysis have shown several directions to pursue in the future. First, we need to develop more advanced component technologies integral to task learning, e.g., more advanced navigation modules through either more effective structures (Hong et al, 2020) or richer perceptions (Shen et al, 2019) to solve navigation bottleneck. We need to develop better representations and more robust and adaptive learning algorithms to support self-monitoring and backtracking.…”
Section: Discussionmentioning
confidence: 99%
“…Image captioning (Anderson et al, 2018b;Vinyals et al, 2015;Xu et al, 2015), visual question answering (Antol et al, 2015;Goyal et al, 2017), and visual dialog (Das et al, 2017a,b) are examples of active research areas in this field. At the same time, visual navigation (Gupta et al, 2017;Shen et al, 2019;Xia et al, 2018) and goal-oriented instruction following (Chen et al, 2019;Fu et al, 2019;Qi et al, 2020b) represent an important part of current work on embodied AI (Das et al, 2018a,b;Savva et al, 2019;Yang et al, 2019). In this context, Visionand-Language Navigation (VLN) (Anderson et al, 2018c) constitutes a peculiar challenge, as it enriches traditional navigation with a set of visually rich environments and detailed instructions.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, this paper also investigated the action decision to evaluate the practicability of the proposed method. As a prior work [8] pointed out, fusion at the action level, which is predicting an action candidate from each representation and adaptively consolidating these action candidates into the final action, reduces redundancies and improves generalization.…”
Section: Fusion At the Action Levelmentioning
confidence: 99%
“…The representation of the environment could be extracted based on computer vision technology [7], including the semantic segments, depth perception, object classes, room layouts and scene class [8]. However, most of the high level representation methods mentioned above require costly and high performance computational resources.…”
Section: Introductionmentioning
confidence: 99%