2021 IEEE International Conference on Robotics and Automation (ICRA) 2021
DOI: 10.1109/icra48506.2021.9561806
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(17 citation statements)
references
References 20 publications
0
17
0
Order By: Relevance
“…The action space A available to the agent consists of navigation actions between physical states, and a stop action which determines the end of a solution. Navigation actions can be discrete, e.g., turn left, turn right or move forward (Anderson et al, 2018c;Krantz et al, 2020), as well as continuous (Irshad et al, 2021). Finally, the goal of the agent is to predict a solution ψΦ V LN consisting of a sequence of actions a t ∈ A,t ∈ [1, T ] that closest align to the instruction, and thus, to the true solution ψ Φ V LN .…”
Section: Tasks In Embodied Vision-language Planningmentioning
confidence: 99%
See 3 more Smart Citations
“…The action space A available to the agent consists of navigation actions between physical states, and a stop action which determines the end of a solution. Navigation actions can be discrete, e.g., turn left, turn right or move forward (Anderson et al, 2018c;Krantz et al, 2020), as well as continuous (Irshad et al, 2021). Finally, the goal of the agent is to predict a solution ψΦ V LN consisting of a sequence of actions a t ∈ A,t ∈ [1, T ] that closest align to the instruction, and thus, to the true solution ψ Φ V LN .…”
Section: Tasks In Embodied Vision-language Planningmentioning
confidence: 99%
“…VLN is the most established EVLP task: a number of datasets (See Section 4.2) exist in both, indoor (Anderson et al, 2018c;Jain et al, 2019;Krantz et al, 2020;Irshad et al, 2021), and outdoor environments (Hermann et al, 2020;Misra et al, 2018). Overall, VLN models have seen considerable progress in improving the ability to get closer to the goal and to the ground truth trajectory (Fried et al, 2018;Li et al, 2020a;Majumdar et al, 2020;Jain et al, 2019).…”
Section: Tasks In Embodied Vision-language Planningmentioning
confidence: 99%
See 2 more Smart Citations
“…In another direction, recent approaches have relaxed the constraint of discrete traversal in VLN into continuous space ( [24,25,26,27,28]). Here, the agent needs to deal with higher task complexity involving time and space.…”
Section: Vision-and-language Navigation (Vln) and Robot Navigationmentioning
confidence: 99%