2022
DOI: 10.3389/frobt.2022.779194
|View full text |Cite
|
Sign up to set email alerts
|

Koopman Operator–Based Knowledge-Guided Reinforcement Learning for Safe Human–Robot Interaction

Abstract: We developed a novel framework for deep reinforcement learning (DRL) algorithms in task constrained path generation problems of robotic manipulators leveraging human demonstrated trajectories. The main contribution of this article is to design a reward function that can be used with generic reinforcement learning algorithms by utilizing the Koopman operator theory to build a human intent model from the human demonstrated trajectories. In order to ensure that the developed reward function produces the correct r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…In the subsequent section, we draw comparisons with two methods based on the Decision Transformer: the original Decision Transformer (DT) [11] and the Q-learning Decision Transformer (QDT) [57]. Additionally, we include a behavior cloning-based method (TS+BC) [19], as well as two offline Q-learning methods, namely S4RL [45] and IQL [26], in our comparisons.…”
Section: Baseline Methodsmentioning
confidence: 99%
“…In the subsequent section, we draw comparisons with two methods based on the Decision Transformer: the original Decision Transformer (DT) [11] and the Q-learning Decision Transformer (QDT) [57]. Additionally, we include a behavior cloning-based method (TS+BC) [19], as well as two offline Q-learning methods, namely S4RL [45] and IQL [26], in our comparisons.…”
Section: Baseline Methodsmentioning
confidence: 99%
“…Such techniques can be applied to state representations to generate unseen data points. In [31,29], different data augmentation schemes are compared and applied to off-the-shelf RL algorithms. However, naively applying data augmentation to DL/RL can cause new problems.…”
Section: Incomplete Datamentioning
confidence: 99%
“…Similarly, end-effector motion of industrial robotic arms around humans requires environmental state prediction for safe path planning. This was done in one study where the Koopman operator was directly solved for by taking the pseudo-inverse involved in minimization of the Frobenius norm (a computationally expensive operation) [33]. The same objective of safe path planning was also achieved in [12] with the use of the adjoint Koopman operator and in [94] using the Stochastic Koopman Operator.…”
Section: Human-robot Collaborationmentioning
confidence: 99%