Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Ahn, Michael J.; Brohan, Anthony; Brown, Noah A.; Chebotar, Yevgen; Cortes, Omar Andrés Carmona; David, Byron; Finn, Chelsea; Gopalakrishnan, Keerthana; Hausman, Karol; Herzog, Alex; Ho, Daniel W. C.; Hsu, Jasmine; Ibarz, Julian; Ichter, Brian; Irpan, Alex; Jeffrey, Kyle; Jesmonth, Sally; Joshi, Nikhil J; Julian, Ryan R.; Kalashnikov, Dmitry; Yuheng, Kuang,; Lee, Kuang-Huei; Levine, Sergey; Lu, Yao; Luu, Linda; Parada, Carolina; Pastor, Peter; Quiambao, Jornell; Rao, Kanishka; Rettinghouse, Jarek; Reyes, Diego; Sermanet, Pierre; Nicolas, Sievers,; Tan, Clayton; Toshev, Alexander; Vanhoucke, Vincent; Xia, Fei; Ted, Xiao,; Xu, Ping; Sichun, Xu,; Yan, Mingyu

doi:10.48550/arxiv.2204.01691

Cited by 98 publications

(191 citation statements)

References 32 publications

Supporting

Mentioning

191

Contrasting

Order By: Relevance

“…Combining autoregressive generation with transformers (Devlin et al, 2018; has been of enormous impact in language modelling Rae et al, 2021), protein folding (Jumper et al, 2021), vision-language models (Alayrac et al, 2022;Tsimpoukelli et al, 2021;, code generation (Chen et al, 2021c;Li et al, 2022b), dialogue systems with retrieval capabilities (Nakano et al, 2021;Thoppilan et al, 2022), speech recognition (Pratap et al, 2020), neural machine translation (Johnson et al, 2019) and more (Bommasani et al, 2021). Recently researchers have explored task decomposition and grounding with language models (Ahn et al, 2022;Huang et al, 2022). Li et al (2022a) construct a control architecture, consisting of a sequence tokenizer, a pretrained language model and a task-specific feed-forward network.…”

Section: Related Workmentioning

confidence: 99%

A Generalist Agent

Reed¹,

Żołna²,

Parisotto³

et al. 2022

Preprint

View full text Add to dashboard Cite

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

show abstract

Section: Related Workmentioning

confidence: 99%

A Generalist Agent

Reed¹,

Żołna²,

Parisotto³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…A closely related setting is learning to solve multiple tasks within the same or similar environments. For example in the robotics field, existing works propose to use language-conditioned tasks [46,3,33], while others posit goal-reaching as a way to learn general skills [49], among other proposals [36,79].…”

Section: Related Workmentioning

confidence: 99%

“…The training datasets we use (Section 3.3) contains scalar reward values clipped to [−1, 1]. For return quantization, we use range {−20, ..., 100} with bin size 1 in all our experiments as we find it covers most of the returns we observe in the datasets 3. We use 6x6 patches, where each patch corresponds to 14x14 pixels, in all our experiments.…”

mentioning

confidence: 99%

Multi-Game Decision Transformers

Lee¹,

Nachum²,

Yang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

A longstanding goal of the field of AI is a strategy for compiling diverse experience into a highly capable, generalist agent. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we investigate whether the same strategy can be used to produce generalist reinforcement learning agents. Specifically, we show that a single transformer-based model -with a single set of weights -trained purely offline can play a suite of up to 46 Atari games simultaneously at close-to-human performance. When trained and evaluated appropriately, we find that the same trends observed in language and vision hold, including scaling of performance with model size and rapid adaptation to new games via fine-tuning. We compare several approaches in this multi-game setting, such as online and offline RL methods and behavioral cloning, and find that our Multi-Game Decision Transformer models offer the best scalability and performance. We release the pre-trained models and code to encourage further research in this direction. 1

show abstract

“…They still remain a formidable force for procedural reasoning, especially coupled with quickly advancing neural-symbolic methods [Huang et al, 2021]. Moreover, structured representations are a step towards language grounding, which is crucial to robots executing instructions [Puig et al, 2018, Huang et al, 2022, Ahn et al, 2022. This area is of course related to procedures and an extremely important front of artificial intelligence, but we will not go into details in this tutorial.…”

Section: Textual Representation Of Proceduresmentioning

confidence: 99%

Reasoning about Procedures with Natural Language Processing: A Tutorial

Zhang¹

2022

Preprint

View full text Add to dashboard Cite

This tutorial provides a comprehensive and in-depth view of the research on procedures, primarily in Natural Language Processing. A procedure is a sequence of steps intended to achieve some goal. Understanding procedures in natural language has a long history, with recent breakthroughs made possible by advances in technology. First, we discuss established approaches to collect procedures, by human annotation or extraction from web resources. Then, we examine different angles from which procedures can be reasoned about, as well as ways to represent them. Finally, we enumerate scenarios where procedural knowledge can be applied to the real world.

show abstract

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Cited by 98 publications

References 32 publications

A Generalist Agent

A Generalist Agent

Multi-Game Decision Transformers

Reasoning about Procedures with Natural Language Processing: A Tutorial

Contact Info

Product

Resources

About