CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant

Srinet, Kavya; Jernite, Yacine; Gray, Jonathan; Szlam, Arthur

doi:10.18653/v1/2020.acl-main.427

Cited by 7 publications

(13 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Minecraft as an Environment for Grounded Language Understanding substantiated the advantages of building an open interactive assistant in the sandbox construction game of Minecraft instead of a "real world" assistant, which is inherently complex and inherently costly to develop and maintain. The Minecraft world's constraints (e.g., coarse 3-d voxel grid and simple physics) and the regularities in the head of the distribution of in-game tasks allow numerous scenarios for grounded NLU research [Yao et al, 2020, Srinet et al, 2020. Minecraft is an appealing competition domain due to its popularity as a video game, of all games ever released, it has the second-most total copies sold.…”

Section: Competition Typementioning

confidence: 99%

NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment

Kiseleva,

Li,

Aliannejadi

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collaborative Environment.The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. Understanding the complexity of the challenge, we split it into sub-tasks to make it feasible for participants.This research challenge is naturally related, but not limited, to two fields of study that are highly relevant to the NeurIPS community: Natural Language Understanding and Generation (NLU/G) and Reinforcement Learning (RL). Therefore, the suggested challenge can bring two communities together to approach one of the important challenges in AI. Another important aspect of the challenge is the dedication to perform a human-in-the-loop evaluation as a final evaluation for the agents developed by contestants.

show abstract

Section: Competition Typementioning

confidence: 99%

NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment

Kiseleva,

Li,

Aliannejadi

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The goal of our work is to learn a mapping from natural language instructions to actions. Some general frameworks for mapping instructions to actions include language-conditioned reinforcement learning [11,8,10], semantic parsers learned from supervision [23,55], and a supervised mapping from instruction to action [42]. Tasks that focus on this problem include instruction guided navigation [4,37,42] and cooperative localization [24].…”

Section: Related Workmentioning

confidence: 99%

“…In the general case, we assume that we have access to a parser over a set of core instructions as well as a semantic segmentation and a sequential generation model. We build our virtual agent on top of the CraftAssist framework [22,55]. This software provides the tooling for creating Minecraft sessions and the virtual agent, including the semantic parsing system and semantic segmentation module that make up the core parsing framework.…”

Section: Core Semantic-parsing Frameworkmentioning

confidence: 99%

See 1 more Smart Citation

Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning

Burns,

Manning,

Fei-Fei

2021

Preprint

View full text Add to dashboard Cite

Although virtual agents are increasingly situated in environments where natural language is the most effective mode of interaction with humans, these exchanges are rarely used as an opportunity for learning. Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding: semantic parsers built on top of fixed object categories are precise but inflexible and end-to-end models are maximally expressive, but fickle and opaque. Our goal is to develop a system that balances the strengths of each approach so that users can teach agents new instructions that generalize broadly from a single example. We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model that can affect the meaning of the label in context. Starting from a core programming language that operates over abstructions, users can define increasingly complex mappings from natural language to actions. We show that with this method a user population is able to build a semantic parser for an open-ended house modification task in Minecraft. The semantic parser that results is both flexible and expressive: the percentage of utterances sourced from redefinitions increases steadily over the course of 191 total exchanges, achieving a final value of 28%.

show abstract

“…The droidlet agent described above should be considered an example of how to build a system using the components; but we do not 1 The agent design happily extends to adding audio perception or audio sensory hardware. 2 After handling text spans; see [40] 3 If there are more than one, the agent defaults to the one nearest the human's point/looking-at location (if that is in memory), and otherwise the one nearest the agent Fig. 3.…”

Section: Platformmentioning

confidence: 99%

droidlet: modular, heterogenous, multi-modal agents

Pratik

Chintala

Srinet

et al. 2021

2021 IEEE International Conference on Robotics and Automation (ICRA)

Self Cite

View full text Add to dashboard Cite

In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale. But most of these systems are: (a) isolated (perception, speech, or language only); (b) trained on static datasets. On the other hand, in the field of robotics, large-scale learning has always been difficult. Supervision is hard to gather and real world physical interactions are expensive.In this work we introduce and open-source droidlet, a modular, heterogeneous agent architecture and platform. It allows us to exploit both large-scale static datasets in perception and language and sophisticated heuristics often used in robotics; and provides tools for interactive annotation. Furthermore, it brings together perception, language and action onto one platform, providing a path towards agents that learn from the richness of real world interactions.

show abstract

CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant

Cited by 7 publications

References 29 publications

NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment

NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment

Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning

droidlet: modular, heterogenous, multi-modal agents

Contact Info

Product

Resources

About