Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.368
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring

Abstract: Despite recent progress, learning new tasks through language instructions remains an extremely challenging problem. On the AL-FRED benchmark for task learning, the published state-of-the-art system only achieves a task success rate of less than 10% in an unseen environment, compared to the human performance of over 90%. To address this issue, this paper takes a closer look at task learning. In a departure from a widely applied end-toend architecture, we decomposed task learning into three sub-problems: sub-goa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(11 citation statements)
references
References 34 publications
(33 reference statements)
0
11
0
Order By: Relevance
“…In “MoViLan + PerfectMap”, ground truth BEV maps are provided to the agent as an ablation study for removal of our mapping module. Our framework demonstrates superior performance compared to the baseline ( Shridhar et al, 2020 ) algorithms, Moca ( Singh et al, 2020 ), HiTUT ( Zhang and Chai, 2021 ), HLSM ( Blukis et al, 2022 ), and LWIT ( Nguyen et al, 2021 ) on complete tasks. For sub-goal tasks, our framework has significantly higher path weighted success rates for “GoTo” compared to previous works (language instructions requiring pure navigation) because of novel mapping module, and hence higher overall success rates due to better positioning.…”
Section: Resultsmentioning
confidence: 92%
See 1 more Smart Citation
“…In “MoViLan + PerfectMap”, ground truth BEV maps are provided to the agent as an ablation study for removal of our mapping module. Our framework demonstrates superior performance compared to the baseline ( Shridhar et al, 2020 ) algorithms, Moca ( Singh et al, 2020 ), HiTUT ( Zhang and Chai, 2021 ), HLSM ( Blukis et al, 2022 ), and LWIT ( Nguyen et al, 2021 ) on complete tasks. For sub-goal tasks, our framework has significantly higher path weighted success rates for “GoTo” compared to previous works (language instructions requiring pure navigation) because of novel mapping module, and hence higher overall success rates due to better positioning.…”
Section: Resultsmentioning
confidence: 92%
“…The VPM module in this study executes the interaction mask of the target object, and the APM module predicts the action sequence. HiTUT method ( Zhang and Chai, 2021 ) tries to increase the success rate of the ALFRED dataset by decomposing task learning into three sub tasks; sub-goal planning, scene navigation and object manipulation. All three sub tasks share the similar input form; therefore they solve together by applying an unified model upon on multi-task learning.…”
Section: Introductionmentioning
confidence: 99%
“…• Learning from Explanations with Neural Execution Tree (Wang et al, 2020) • Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach (Yin et al, 2019) • Textual Entailment for Event Argument Extraction: Zero-and Few-Shot with Multi-Source Learning (Sainz et al, 2022) • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing (Liu et al, 2021) • True Few-Shot Learning With Prompts-A Real-World Perspective (Schick and Schütze, 2022) • The Turking Test: Can Language Models Understand Instructions? (Efrat and Levy, 2020) • Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring (Zhang and Chai, 2021) • Cross-Task Generalization via Natural Language Crowdsourcing Instructions (Mishra et al, 2022) • MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction Following (Lou et al, 2023)…”
Section: A Appendixmentioning
confidence: 99%
“…Several approaches have been proposed to solve this. Predominant methods exploit the embodied nature of a robotic agent to infer and refine the plan by primarily using multi-modal input that includes visual feedback and action priors (Paxton et al, 2019;Shridhar et al, 2020;Zhang and Chai, 2021;Ahn et al, 2022). Thus natural language understanding in these systems is simplified by obtaining a latent representation of the language input to bias the inference using attention modeling.…”
Section: Related Workmentioning
confidence: 99%