Housekeep: Tidying Virtual Households using Commonsense Reasoning

Kant, Yash; Ramachandran, Arun; Yenamandra, Sriram; Gilitschenski, Igor; Batra, Dhruv; Szot, Andrew; Agrawal, Harsh

doi:10.48550/arxiv.2205.10712

Cited by 4 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While there is a challenge in transferring results from simulated to real environments, simulated environments are more accessible, less expensive, and allow for the testing of technologies that may not be sufficiently safe for use in the real world (Savva et al, 2019). Additionally, while simulated environments can be used for tasks that do not require the use of language (Anderson et al, 2018a;Batra et al, 2020;Gan et al, 2020;Kant et al, 2022), they play a particularly valuable role in developing language understanding and reasoning capabilities over actions that are currently difficult for physical robots to complete, but we hope it will become a reality in the future (Kolve et al, 2017). Much of the work in language understanding for embodied AI happens using vision and language navigation, where an agent must learn to navigate through a previously unseen environment purely based on natural language route instructions (Anderson et al, 2018b;.…”

Section: B Additional Related Workmentioning

confidence: 99%

Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue

Padmakumar,

Inan,

Gella

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Embodied task completion is a challenge where an agent in a simulated environment must predict environment actions to complete tasks based on natural language instructions and egocentric visual observations. We propose a variant of this problem where the agent predicts actions at a higher level of abstraction called a plan, which helps make agent actions more interpretable and can be obtained from the appropriate prompting of large language models. We show that multimodal transformer models can outperform language-only models for this problem but fall significantly short of oracle plans. Since collecting human-human dialogues for embodied environments is expensive and time-consuming, we propose a method to synthetically generate such dialogues, which we then use as training data for plan prediction. We demonstrate that multimodal transformer models can attain strong zero-shot performance from our synthetic data, outperforming language-only models trained on humanhuman data. * Contributions from Mert İnan and Dilek Hakkani-Tur were provided when they were employed at Amazon.

show abstract

Section: B Additional Related Workmentioning

confidence: 99%

Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue

Padmakumar,

Inan,

Gella

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Another stream of research attempts to leverage the Large Language Model (LLM) or large Visual Language Model (VLM) for goal specification. [24], [25] notice the necessity of automatic goal inference for tying rooms and exploit the commonsense knowledge from LLM or memex graph to infer rearrangements goals when the goal is unspecified. TidyBot [26] also leverages an LLM to summarize the rearrangement preference from a few examples provided by the user.…”

Section: A Object Rearrangement With Functional Requirementsmentioning

confidence: 99%

“…As a result, the robotics community has started to integrate the pre-trained large models into the robot learning workflow [33], [34], [35], [36], [37], [38], [7], [24], [25], [26]. In navigation, VLMaps [33] and NavGPT [34] leverage the LLM to translate natural language instructions into explicit goals or actions.…”

Section: B Leveraging Large Models For Robot Learningmentioning

confidence: 99%

See 1 more Smart Citation

The integration structure enhances performance of perovskite solar cells

et al. 2021

View full text Add to dashboard Cite

“…Many LLMs have been developed in recent years, such as BERT [24], GPT-3 [12], ChatGPT [13], CodeX [25], and OPT [26]. These LLMs can encode a large amount of common sense [14] and have been applied to robot task planning [27]- [32]. For instance, the work of Huang et.…”

Section: Robot Planning With Large Language Modelsmentioning

confidence: 99%

Task-Motion Planning for Safe and Efficient Urban Driving

Ding

Zhang

Zhan

et al. 2020

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning is frequently needed in this process. However, achieving commonsense arrangements requires knowledge about objects, which is hard to transfer to robots. Large language models (LLMs) are one potential source of this knowledge, but they do not naively capture information about plausible physical arrangements of the world. We propose LLM-GROP, which uses prompting to extract commonsense knowledge about semantically valid object configurations from an LLM and instantiates them with a task and motion planner in order to generalize to varying scene geometry. LLM-GROP allows us to go from natural-language commands to humanaligned object rearrangement in varied environments. Based on human evaluations, our approach achieves the highest rating while outperforming competitive baselines in terms of success rate while maintaining comparable cumulative action costs. Finally, we demonstrate a practical implementation of LLM-GROP on a mobile manipulator in real-world scenarios.

show abstract

Housekeep: Tidying Virtual Households using Commonsense Reasoning

Cited by 4 publications

References 0 publications

Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue

Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue

The integration structure enhances performance of perovskite solar cells

Task-Motion Planning for Safe and Efficient Urban Driving

Contact Info

Product

Resources

About