2021 IEEE International Conference on Robotics and Automation (ICRA) 2021
DOI: 10.1109/icra48506.2021.9560829
|View full text |Cite
|
Sign up to set email alerts
|

Learning Human Objectives from Sequences of Physical Corrections

Abstract: When personal, assistive, and interactive robots make mistakes, humans naturally and intuitively correct those mistakes through physical interaction. In simple situations, one correction is sufficient to convey what the human wants. But when humans are working with multiple robots or the robot is performing an intricate task often the human must make several corrections to fix the robot's behavior. Prior research assumes each of these physical corrections are independent events, and learns from them one-at-a-t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…Notably, correctional feedback has been used in robotics applications (Li et al, 2021;Losey et al, 2021;Bajcsy et al, 2018;Ross et al, 2011). However, these robotics-focused works focus on improving a trajectory or a sequence of actions, which is a multi-step bandit problem.…”
Section: Types Of Human Feedbackmentioning
confidence: 99%
See 1 more Smart Citation
“…Notably, correctional feedback has been used in robotics applications (Li et al, 2021;Losey et al, 2021;Bajcsy et al, 2018;Ross et al, 2011). However, these robotics-focused works focus on improving a trajectory or a sequence of actions, which is a multi-step bandit problem.…”
Section: Types Of Human Feedbackmentioning
confidence: 99%
“…Layout designs are used by engineers and designers to produce vectorized arrangements and models. There are several commonly used datasets for layout modeling, including Pub-layNet (Zhong et al, 2019), CLAY (Li et al, 2021), and RICO-Semantic (Sunkara et al, 2022). In this paper, we focus on UI layouts, which consist of a collection of UI elements.…”
Section: Generative Layout Modelsmentioning
confidence: 99%
“…RLHF is a type of reinforcement learning that learns from human feedback, which encourage the alignment of their learning objectives with human values [185]. Existing feedback mechanisms are diverse to suit different learning objectives, including critiques [143,371,382], comparisons [88,254,362,374], improvements [153,187,196], natural language [2,199,337,383], etc. However, these approaches mostly focus on improving model performance, neglecting the explainability.…”
Section: Neural Sentiment Analysismentioning
confidence: 99%