Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.103
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

Abstract: One of the most challenging topics in Natural Language Processing (NLP) is visuallygrounded language understanding and reasoning. Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment. Due to the lack of human-annotated instructions that illustrate intricate urban scenes, outdoor VLN remains a challenging task to solve. This paper introduces a Multimodal Text Style Transfer (MTST) learning approach and levera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
34
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(35 citation statements)
references
References 35 publications
1
34
0
Order By: Relevance
“…For facilitating the experiment, Zhu et al [145] divided the original StreetLearn dataset into a small part, i.e., Manh-50, which mainly covers the Manhattan area with 31K training data. In addition, a baseline Multimodal Text Style Transfer learning approach is proposed to generate style-modified instructions for external resources and address the data scarcity issue.…”
Section: Street View Navigationmentioning
confidence: 99%
“…For facilitating the experiment, Zhu et al [145] divided the original StreetLearn dataset into a small part, i.e., Manh-50, which mainly covers the Manhattan area with 31K training data. In addition, a baseline Multimodal Text Style Transfer learning approach is proposed to generate style-modified instructions for external resources and address the data scarcity issue.…”
Section: Street View Navigationmentioning
confidence: 99%
“…Variations of VLN include indoor navigation [3,33,66,40], street-level navigation [9,53], visionand-dialog navigation [59,74,26], VLN in continuous environments [39], and more. Notwithstanding considerable exploration of pretraining strategies [46,27,50,87], data augmentation approaches [20,21,73], agent architectures and loss functions [86,48,49], existing work in this space considers only model-free approaches. Our aim is to unlock model-based approaches to these tasks, using a visual world model to encode prior commonsense knowledge about human environments and thereby relieve the burden on the agent to learn these regularities.…”
Section: Related Workmentioning
confidence: 99%
“…Through the years, agents with different model architectures and training mechanisms have been proposed for indoor VLN [1,5,6,9,10,11,14,19,23,31,37,38,39,40,41,46] and outdoor VLN [3,24,26,43,44,47]. Backtranslation eases the urgent problem of data scarcity [5].…”
Section: Introductionmentioning
confidence: 99%
“…Imitation learning and reinforcement learning enhance agents' generalization ability [39,40]. With the rise of BERTbased models, researchers also apply Transformer and pretraining to further improve navigation performance [6,10,47]. While applying new techniques to the navigation agents might boost their performance, we still know little about how agents make each turning decision.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation