2022
DOI: 10.48550/arxiv.2204.01691
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Abstract: Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrativ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
191
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 98 publications
(191 citation statements)
references
References 32 publications
0
191
0
Order By: Relevance
“…Combining autoregressive generation with transformers (Devlin et al, 2018; has been of enormous impact in language modelling Rae et al, 2021), protein folding (Jumper et al, 2021), vision-language models (Alayrac et al, 2022;Tsimpoukelli et al, 2021;, code generation (Chen et al, 2021c;Li et al, 2022b), dialogue systems with retrieval capabilities (Nakano et al, 2021;Thoppilan et al, 2022), speech recognition (Pratap et al, 2020), neural machine translation (Johnson et al, 2019) and more (Bommasani et al, 2021). Recently researchers have explored task decomposition and grounding with language models (Ahn et al, 2022;Huang et al, 2022). Li et al (2022a) construct a control architecture, consisting of a sequence tokenizer, a pretrained language model and a task-specific feed-forward network.…”
Section: Related Workmentioning
confidence: 99%
“…Combining autoregressive generation with transformers (Devlin et al, 2018; has been of enormous impact in language modelling Rae et al, 2021), protein folding (Jumper et al, 2021), vision-language models (Alayrac et al, 2022;Tsimpoukelli et al, 2021;, code generation (Chen et al, 2021c;Li et al, 2022b), dialogue systems with retrieval capabilities (Nakano et al, 2021;Thoppilan et al, 2022), speech recognition (Pratap et al, 2020), neural machine translation (Johnson et al, 2019) and more (Bommasani et al, 2021). Recently researchers have explored task decomposition and grounding with language models (Ahn et al, 2022;Huang et al, 2022). Li et al (2022a) construct a control architecture, consisting of a sequence tokenizer, a pretrained language model and a task-specific feed-forward network.…”
Section: Related Workmentioning
confidence: 99%
“…A closely related setting is learning to solve multiple tasks within the same or similar environments. For example in the robotics field, existing works propose to use language-conditioned tasks [46,3,33], while others posit goal-reaching as a way to learn general skills [49], among other proposals [36,79].…”
Section: Related Workmentioning
confidence: 99%
“…The training datasets we use (Section 3.3) contains scalar reward values clipped to [−1, 1]. For return quantization, we use range {−20, ..., 100} with bin size 1 in all our experiments as we find it covers most of the returns we observe in the datasets 3. We use 6x6 patches, where each patch corresponds to 14x14 pixels, in all our experiments.…”
mentioning
confidence: 99%
“…They still remain a formidable force for procedural reasoning, especially coupled with quickly advancing neural-symbolic methods [Huang et al, 2021]. Moreover, structured representations are a step towards language grounding, which is crucial to robots executing instructions [Puig et al, 2018, Huang et al, 2022, Ahn et al, 2022. This area is of course related to procedures and an extremely important front of artificial intelligence, but we will not go into details in this tutorial.…”
Section: Textual Representation Of Proceduresmentioning
confidence: 99%