iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

Li, Chengshu; Xia, Fei; Martí­n-Martí­n, Roberto; Lingelbach, Michael; Srivastava, Sanvesh; Shen, Bokui; Vainio, Kent; Gokmen, Cem; Dharan, Gokul; Jain, Tanish; Kurenkov, Andrey; Liu, C. Karen; Gweon, Hyowon; Li, Feifei; Savarese, Silvio

doi:10.48550/arxiv.2108.03272

Cited by 14 publications

(16 citation statements)

References 32 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Physical simulators have become a vital tool for embodied AI research. A growing trend is shifting from static 3D scenes for visual navigation [23,46,59] to interactive environments that support physical interaction between the robot and the objects [9,24,52]. Interactive 3D assets are the key elements to construct these simulators.…”

Section: Related Workmentioning

confidence: 99%

“…Recent efforts in embodied AI platforms [23,24,52] have incorporated interactive articulated objects, such as cabinets and drawers, in simulated household environments and employed them for training virtual agents. Even so, they heavily rely on graphics designers and engineers to author and curate the object models, limiting the scalability of the asset acquisition process.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Ditto: Building Digital Twins of Articulated Objects from Interaction

Jiang¹,

Hsu²,

Zhu³

2022

Preprint

View full text Add to dashboard Cite

Digitizing physical objects into the virtual world has the potential to unlock new research and applications in embodied AI and mixed reality. This work focuses on recreating interactive digital twins of real-world articulated objects, which can be directly imported into virtual environments. We introduce Ditto to learn articulation model estimation and 3D geometry reconstruction of an articulated object through interactive perception. Given a pair of visual observations of an articulated object before and after interaction, Ditto reconstructs part-level geometry and estimates the articulation model of the object. We employ implicit neural representations for joint geometry and articulation modeling. Our experiments show that Ditto effectively builds digital twins of articulated objects in a category-agnostic way. We also apply Ditto to real-world objects and deploy the recreated digital twins in physical simulation. Code and additional results are available at https://ut-austin-rpl.github.io/Ditto/

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Ditto: Building Digital Twins of Articulated Objects from Interaction

Jiang¹,

Hsu²,

Zhu³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent progress in Embodied Artificial Intelligence, spans both simulation environments Kolve et al (2017); Li et al (2021); Savva et al (2019); Gan et al (2020); Puig et al (2018) and sophisticated tasks Das et al (2018); Anderson et al (2018); Shridhar et al (2020). Our work is most closely related to research in language-guided task completion, Neural SLAM, and exploration.…”

Section: Related Workmentioning

confidence: 99%

Learning to Act with Affordance-Aware Multimodal Neural SLAM

Jia¹,

Lin²,

Zhao³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent years have witnessed an emerging paradigm shift toward embodied artificial intelligence, in which an agent must learn to solve challenging tasks by interacting with its environment. There are several challenges in solving embodied multimodal tasks, including long-horizon planning, vision-and-language grounding, and efficient exploration. We focus on a critical bottleneck, namely the performance of planning and navigation. To tackle this challenge, we propose a Neural SLAM approach that, for the first time, utilizes several modalities for exploration, predicts an affordance-aware semantic map, and plans over it at the same time. This significantly improves exploration efficiency, leads to robust long-horizon planning, and enables effective vision-and-language grounding. With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of 23.48% on the test unseen scenes.

show abstract

“…Embodied artificial intelligence (EAI) has attracted significant attention, both in advanced deep learning models and algorithms [1,2,3,4] and the rapid development of simulated platforms [5,6,7,8,9]. Many open challenges [10,11,12,13] have been proposed to facilitate EAI research.…”

Section: Introductionmentioning

confidence: 99%

“…Many open challenges [10,11,12,13] have been proposed to facilitate EAI research. A critical bottleneck in existing simulated platforms [10,12,8,5,14] is the limited number of indoor scenes that support vision-and-language navigation, object interaction, and complex household tasks. This limitation makes it difficult to verify whether state-of-the-art methods generalize well to unseen scenarios or whether they are specialized to a small number of room structures.…”

Section: Introductionmentioning

confidence: 99%

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

Zhao¹,

Lin²,

Jia³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning-based methods for training embodied agents typically require a large number of high-quality scenes that contain realistic layouts and support meaningful interactions. However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts. This paper presents LUMINOUS, the first research framework that employs stateof-the-art indoor scene synthesis algorithms to generate large-scale simulated scenes for Embodied AI challenges. Further, we automatically and quantitatively evaluate the quality of generated indoor scenes via their ability to support complex household tasks. LUMINOUS incorporates a novel scene generation algorithm (Constrained Stochastic Scene Generation (CSSG)), which achieves competitive performance with human-designed scenes. Within LUMINOUS, the EAI task executor, task instruction generation module, and video rendering toolkit can collectively generate a massive multimodal dataset of new scenes for the training and evaluation of Embodied AI agents. Extensive experimental results demonstrate the effectiveness of the data generated by LUMINOUS, enabling the comprehensive assessment of embodied agents on generalization and robustness. The full codebase and documentation of LUMINOUS is available at: https: //github.com/amazon-research/indoor-scene-generation-eai/.

show abstract

iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

Cited by 14 publications

References 32 publications

Ditto: Building Digital Twins of Articulated Objects from Interaction

Ditto: Building Digital Twins of Articulated Objects from Interaction

Learning to Act with Affordance-Aware Multimodal Neural SLAM

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

Contact Info

Product

Resources

About