ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

Wake, Naoki; Kanehira, Atsushi; Sasabuchi, Kazuhiro; Takamatsu, Jun; Ikeuchi, Katsushi

doi:10.1109/access.2023.3310935

Cited by 38 publications

(11 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…cooking operation, increases the confusion rate for STG3. Since we conducted this experiment, ChatGPT has recently been publicly available, which can greatly improve the performance (Wake et al, 2023). In summary, this section has investigated the possibility of recognizing task groups from verbal instructions.…”

Section: Resultsmentioning

confidence: 99%

“…In parallel with this, we have recently obtained preliminary but highly promising results on the possibility of decomposing common input sentences into task groups by providing appropriate prompts to ChatGPT. Here, we were able to generate a sequence of task models that could be performed by the robot by providing, as prompts, an environment description, a set of skills from the skill library, and the role of the ChatGPT (Wake et al, 2023). However, the skill parameters were obtained from the human demonstrations.…”

Section: Discussion: Verbal Inputmentioning

confidence: 99%

See 1 more Smart Citation

Semantic constraints to represent common sense required in household actions for multimodal learning-from-observation robot

Ikeuchi,

Wake,

Sasabuchi

et al. 2023

The International Journal of Robotics Research

View full text Add to dashboard Cite

The learning-from-observation (LfO) paradigm allows a robot to learn how to perform actions by observing human actions. Previous research in top-down learning-from-observation has mainly focused on the industrial domain, which consists only of the real physical constraints between a manipulated tool and the robot’s working environment. To extend this paradigm to the household domain, which consists of imaginary constraints derived from human common sense, we introduce the idea of semantic constraints, which are represented similarly to the physical constraints by defining an imaginary contact with an imaginary environment. By studying the transitions between contact states under physical and semantic constraints, we derive a necessary and sufficient set of task representations that provides the upper bound of the possible task set. We then apply the task representations to analyze various actions in top-rated household YouTube videos and real home cooking recordings, classify frequently occurring constraint patterns into physical, semantic, and multi-step task groups, and determine a subset that covers standard household actions. Finally, we design and implement task models, corresponding to these task representations in the subset, with the necessary daemon functions to collect the necessary parameters to perform the corresponding household actions. Our results provide promising directions for incorporating common sense into the robot teaching literature.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Discussion: Verbal Inputmentioning

confidence: 99%

Semantic constraints to represent common sense required in household actions for multimodal learning-from-observation robot

Ikeuchi,

Wake,

Sasabuchi

et al. 2023

The International Journal of Robotics Research

View full text Add to dashboard Cite

show abstract

“…In contrast, inner monologue [8] feeds back execution results and observations into the LLM, but does not rely on code-writing, thus missing its combinatorial power. Recent technical reports [33], [34] provide guidance on utilizing ChatGPT [4] for robot orchestration. While TidyBot [35] uses GPT-3 [2] in a similar way to generate high-level plans for tidying up a cluttered real-world environment, the authors focus on personalization by summarizing and thereby generalizing individual object placement rules.…”

Section: B Orchestrating Robot Behavior With Llmsmentioning

confidence: 99%

“…With our proposed emulated Python console prompting, we differ from these existing works by (i) formatting and interpreting all interaction with the LLM as Python code, in contrast to [6], [8], (ii) closing the interaction loop by enabling the LLM to reason about each perception and action outcome, in contrast to [7], [32], [34], [31], [6], (iii) allowing the LLM to decide itself when and which perception primitives to invoke, instead of providing a predefined list of observations (usually a list of objects in the scene) as part of the prompt as in [31], [8], [32], [7], [35], and (iv) simplifying the task for the LLM by allowing it to generate one statement at a time, in contrast to [7], [32], [33].…”

Section: B Orchestrating Robot Behavior With Llmsmentioning

confidence: 99%

Deep Episodic Memory for Verbalization of Robot Experience

Brmann

Peller-Konrad

Constantin

et al. 2021

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Natural-language dialog is key for intuitive humanrobot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mistakes in the future. In this paper, we propose a system to achieve incremental learning of complex behavior from natural interaction, and demonstrate its implementation on a humanoid robot. Building on recent advances, we present a system that deploys Large Language Models (LLMs) for high-level orchestration of the robot's behavior, based on the idea of enabling the LLM to generate Python statements in an interactive console to invoke both robot perception and action. The interaction loop is closed by feeding back human instructions, environment observations, and execution results to the LLM, thus informing the generation of the next statement. Specifically, we introduce incremental prompt learning, which enables the system to interactively learn from its mistakes. For that purpose, the LLM can call another LLM responsible for code-level improvements of the current interaction based on human feedback. The improved interaction is then saved in the robot's memory, and thus retrieved on similar requests. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6 and evaluate our methods both quantitatively (in simulation) and qualitatively (in simulation and real-world) by demonstrating generalized incrementally-learned knowledge.

show abstract

“…In light of the substantial progress in Large Language Models (LLM) and Vision Language Models (VLM) in recent years, a number of methodologies have emerged that convert language/visual input into robotic manipulative actions. While a mainstream approach is training custom models based on extensive data of robot actions [1]- [7], several studies have explored the use of general-purpose, off-the-shelf language models such as ChatGPT [8] and GPT-4 [9] through prompt engineering without additional training [10]- [17]. One key advantage of using off-the-shelf models is their flexibility; they can be adapted to various robotic hardware configurations and functionalities simply by modifying prompts.…”

Section: Introductionmentioning

confidence: 99%

Interactive Task Encoding System for Learning-from-Observation

Wake¹,

Kanehira²,

Sasabuchi³

et al. 2023

2023 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)

Self Cite

View full text Add to dashboard Cite

This technical report explores the ability of Chat-GPT in recognizing emotions from text, which can be the basis of various applications like interactive chatbots, data annotation, and mental health analysis. While prior research has shown ChatGPT's basic ability in sentiment analysis, its performance in more nuanced emotion recognition is not yet explored. Here, we conducted experiments to evaluate its performance of emotion recognition across different datasets and emotion labels. Our findings indicate a reasonable level of reproducibility in its performance, with noticeable improvement through fine-tuning. However, the performance varies with different emotion labels and datasets, highlighting an inherent instability and possible bias. The choice of dataset and emotion labels significantly impacts ChatGPT's emotion recognition performance. This paper sheds light on the importance of dataset and label selection, and the potential of fine-tuning in enhancing ChatGPT's emotion recognition capabilities, providing a groundwork for better integration of emotion analysis in applications using ChatGPT.

show abstract

ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

Cited by 38 publications

References 35 publications

Semantic constraints to represent common sense required in household actions for multimodal learning-from-observation robot

Semantic constraints to represent common sense required in household actions for multimodal learning-from-observation robot

Deep Episodic Memory for Verbalization of Robot Experience

Interactive Task Encoding System for Learning-from-Observation

Contact Info

Product

Resources

About