2022 IEEE 61st Conference on Decision and Control (CDC) 2022
DOI: 10.1109/cdc51059.2022.9992858
|View full text |Cite
|
Sign up to set email alerts
|

A Teacher-Student Markov Decision Process-based Framework for Online Correctional Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…If no corrections are received while it follows the trajectories throughout the environment, the shifted objects did not actually have an impact on the features so the robot can correctly perform the tasks. If corrections are received but β, computed according to (7), is large, then the features are still correctly represented and the robot just has to update their importance theta for that task as in (8). If, on the other hand, β is small, no θ can explain the corrections and therefore the feature representation φ(R, o 1 , .…”
Section: A Diagnosing Misaligned Featuresmentioning
confidence: 99%
See 2 more Smart Citations
“…If no corrections are received while it follows the trajectories throughout the environment, the shifted objects did not actually have an impact on the features so the robot can correctly perform the tasks. If corrections are received but β, computed according to (7), is large, then the features are still correctly represented and the robot just has to update their importance theta for that task as in (8). If, on the other hand, β is small, no θ can explain the corrections and therefore the feature representation φ(R, o 1 , .…”
Section: A Diagnosing Misaligned Featuresmentioning
confidence: 99%
“…Learning from corrections is another way of learning from human input that can be a good complement and advantageous in many situations where real-time and taskspecific learning is needed [7], [8]. Methods to incorporate corrections in real-time to align robot and human preferences have been shown to improve performance and adaptability for HRI [2], [9], [10], [11].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation