2023
DOI: 10.1109/access.2023.3310935
|View full text |Cite
|
Sign up to set email alerts
|

ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

Naoki Wake,
Atsushi Kanehira,
Kazuhiro Sasabuchi
et al.

Abstract: This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 38 publications
(11 citation statements)
references
References 35 publications
0
11
0
Order By: Relevance
“…cooking operation, increases the confusion rate for STG3. Since we conducted this experiment, ChatGPT has recently been publicly available, which can greatly improve the performance (Wake et al, 2023). In summary, this section has investigated the possibility of recognizing task groups from verbal instructions.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…cooking operation, increases the confusion rate for STG3. Since we conducted this experiment, ChatGPT has recently been publicly available, which can greatly improve the performance (Wake et al, 2023). In summary, this section has investigated the possibility of recognizing task groups from verbal instructions.…”
Section: Resultsmentioning
confidence: 99%
“…In parallel with this, we have recently obtained preliminary but highly promising results on the possibility of decomposing common input sentences into task groups by providing appropriate prompts to ChatGPT. Here, we were able to generate a sequence of task models that could be performed by the robot by providing, as prompts, an environment description, a set of skills from the skill library, and the role of the ChatGPT (Wake et al, 2023). However, the skill parameters were obtained from the human demonstrations.…”
Section: Discussion: Verbal Inputmentioning
confidence: 99%
“…In contrast, inner monologue [8] feeds back execution results and observations into the LLM, but does not rely on code-writing, thus missing its combinatorial power. Recent technical reports [33], [34] provide guidance on utilizing ChatGPT [4] for robot orchestration. While TidyBot [35] uses GPT-3 [2] in a similar way to generate high-level plans for tidying up a cluttered real-world environment, the authors focus on personalization by summarizing and thereby generalizing individual object placement rules.…”
Section: B Orchestrating Robot Behavior With Llmsmentioning
confidence: 99%
“…With our proposed emulated Python console prompting, we differ from these existing works by (i) formatting and interpreting all interaction with the LLM as Python code, in contrast to [6], [8], (ii) closing the interaction loop by enabling the LLM to reason about each perception and action outcome, in contrast to [7], [32], [34], [31], [6], (iii) allowing the LLM to decide itself when and which perception primitives to invoke, instead of providing a predefined list of observations (usually a list of objects in the scene) as part of the prompt as in [31], [8], [32], [7], [35], and (iv) simplifying the task for the LLM by allowing it to generate one statement at a time, in contrast to [7], [32], [33].…”
Section: B Orchestrating Robot Behavior With Llmsmentioning
confidence: 99%
“…In light of the substantial progress in Large Language Models (LLM) and Vision Language Models (VLM) in recent years, a number of methodologies have emerged that convert language/visual input into robotic manipulative actions. While a mainstream approach is training custom models based on extensive data of robot actions [1]- [7], several studies have explored the use of general-purpose, off-the-shelf language models such as ChatGPT [8] and GPT-4 [9] through prompt engineering without additional training [10]- [17]. One key advantage of using off-the-shelf models is their flexibility; they can be adapted to various robotic hardware configurations and functionalities simply by modifying prompts.…”
Section: Introductionmentioning
confidence: 99%