2017
DOI: 10.48550/arxiv.1711.11017
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

HoME: a Household Multimodal Environment

Simon Brodeur,
Ethan Perez,
Ankesh Anand
et al.

Abstract: We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, soundbased navigation, robotics, multi-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(42 citation statements)
references
References 25 publications
0
42
0
Order By: Relevance
“…Embodied AI Environments. The recent surge of research in embodied perception is greatly driven by new 3D environments [90,4,13,10,77] and simulation platforms, such as iGibsonV1 [89], AI2-THOR [45], and Habitat [71]. Compared with grid-like or game environments [96,70,7,37], these open accessible, photo-realistic platforms bring perception and planning in a close loop and make the training and testing of embodied agents reproducible [17].…”
Section: Related Workmentioning
confidence: 99%
“…Embodied AI Environments. The recent surge of research in embodied perception is greatly driven by new 3D environments [90,4,13,10,77] and simulation platforms, such as iGibsonV1 [89], AI2-THOR [45], and Habitat [71]. Compared with grid-like or game environments [96,70,7,37], these open accessible, photo-realistic platforms bring perception and planning in a close loop and make the training and testing of embodied agents reproducible [17].…”
Section: Related Workmentioning
confidence: 99%
“…Compared to GPS-based navigation, VLN accepts surrounding environments as visual inputs and correlates them with instruction in human language. Most VLN datasets in the past are based on synthesized 3-D scenes (Kolve et al 2017;Brodeur et al 2017;Wu et al 2018;Yan et al 2018;Song et al 2017).…”
Section: Vision-and-language Navigation Datasetsmentioning
confidence: 99%
“…To help the research in embodied AI learning, various simulated environments have spurred for the community's benefit. Those 3D environments are created from either synthetic scenes [21,44,[47][48][49][50] or real photographs [51][52][53][54]; some of them use game engines to enable physical interactions [21,44,[55][56][57]. In this paper, we choose AI2-THOR, which uses Unity as the physics engine, and build the environment on it.…”
Section: Simulated Environmentsmentioning
confidence: 99%