2022
DOI: 10.48550/arxiv.2201.09862
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Act with Affordance-Aware Multimodal Neural SLAM

Abstract: Recent years have witnessed an emerging paradigm shift toward embodied artificial intelligence, in which an agent must learn to solve challenging tasks by interacting with its environment. There are several challenges in solving embodied multimodal tasks, including long-horizon planning, vision-and-language grounding, and efficient exploration. We focus on a critical bottleneck, namely the performance of planning and navigation. To tackle this challenge, we propose a Neural SLAM approach that, for the first ti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…In simulated environments, Logeswaran et al (2022) propose a language-only finetuned GPT-2 model for task planning on ALFRED . Some end-to-end ALFRED models also have task planning as a component (Min et al, 2021;Jia et al, 2022;Blukis et al, 2022). However, this is a simpler dataset where task planning can be cast as a 7-way classification problem.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In simulated environments, Logeswaran et al (2022) propose a language-only finetuned GPT-2 model for task planning on ALFRED . Some end-to-end ALFRED models also have task planning as a component (Min et al, 2021;Jia et al, 2022;Blukis et al, 2022). However, this is a simpler dataset where task planning can be cast as a 7-way classification problem.…”
Section: Related Workmentioning
confidence: 99%
“…In such a system, the coffee task considered above would likely start by invoking a semantic navigation module to find the mug and a grasping module to pick it up. Some prior work has been on embodied AI benchmarks suggesting that more modular models can outperform monolithic models (Min et al, 2021;Jia et al, 2022;Zheng et al, 2022;Min et al, 2022). However, these do not evaluate and explore the limitations of individual modules.…”
Section: Introductionmentioning
confidence: 99%
“…We observe small performance improvements on success rate of up to 2 points when the language input is marked up with dialog acts, either at the end or start and end of an utterance, but less benefit is observed from speaker information. We believe that stronger improvements will likely be observed when using a more modular approach (eg: (Min et al, 2021)) where it is easier to decouple the effects of errors arising from language understanding from those arising from navigation which is the most difficult component when predicting such low-level actions (Blukis et al, 2022;Jia et al, 2022;Min et al, 2021).…”
Section: Execution From Dialog Historymentioning
confidence: 99%