Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.356
|View full text |Cite
|
Sign up to set email alerts
|

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

Abstract: We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets. It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities. Furthermore, each word in an instruction is time-aligned to the virtual poses of instruction creators and validators. We establish baseline scores for monolingual and multilingual se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
165
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 151 publications
(165 citation statements)
references
References 35 publications
0
165
0
Order By: Relevance
“…The same holds for DTW measures: Ilharco et al (2019) report a success rate of 44% and corresponding SDTW of 38.3% for a fidelity-oriented version of the Reinforced Cross-modal Matching agent (Wang et al, 2019). Ku et al (2020) reports lower SDTW scores of 21% to 24%. Given this, the TC of 12.8% and SDTW of 1.4% obtained by Retouch-RCONCAT and current best results from Xiang et al (2020) (TC: 19.0%; SDTW: 16.3%), amply demonstrates the challenge of the outdoor navigation problem defined by Touchdown.…”
Section: Methodsmentioning
confidence: 83%
See 1 more Smart Citation
“…The same holds for DTW measures: Ilharco et al (2019) report a success rate of 44% and corresponding SDTW of 38.3% for a fidelity-oriented version of the Reinforced Cross-modal Matching agent (Wang et al, 2019). Ku et al (2020) reports lower SDTW scores of 21% to 24%. Given this, the TC of 12.8% and SDTW of 1.4% obtained by Retouch-RCONCAT and current best results from Xiang et al (2020) (TC: 19.0%; SDTW: 16.3%), amply demonstrates the challenge of the outdoor navigation problem defined by Touchdown.…”
Section: Methodsmentioning
confidence: 83%
“…the current state-of-the-art success rate (equivalent to TC) for R2R on the validation unseen dataset is 55% (Zhu et al, 2019). It is even considerably harder than Room-across-Room dataset, which has longer, more challenging paths than R2R and success rates of 26% to 30% for three different languages (Ku et al, 2020). The same holds for DTW measures: Ilharco et al (2019) report a success rate of 44% and corresponding SDTW of 38.3% for a fidelity-oriented version of the Reinforced Cross-modal Matching agent (Wang et al, 2019).…”
Section: Methodsmentioning
confidence: 95%
“…Similarity between the annotated (Guide) path and the Follower path is also a natural measure of the joint quality of both the Guide and the Follower annotations. In the experiments for RxR, the path extracted from the Follower's pose trace was also used as additional supervision when training Follower agents, since it represents a step-by-step account of how a human solved the task and the visual inputs they focused on in order to do so (Ku et al, 2020).…”
Section: Pangea Toolkitmentioning
confidence: 99%
“…The release of high-quality 3D building and street captures (Chang et al, 2017;Mirowski et al, 2019;Mehta et al, 2020;Xia et al, 2018;Straub et al, 2019) has galvanized interest in developing embodied navigation agents that can operate in complex human environments. Based on these environments, annotations have been collected for a variety of tasks including navigating to a particular class of object (ObjectNav) (Batra et al, 2020), navigating from language instructions aka visionand-language navigation (VLN) (Anderson et al, 2018b;Qi et al, 2020;Ku et al, 2020), and vision-and-dialog navigation (Thomason et al, 2020;Hahn et al, 2020). To date, most of these data collection efforts have required the development of custom annotation tools.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation