2019
DOI: 10.1177/0278364919871998
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning of motor skills using Policy Search and human corrective advice

Abstract: Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 22 publications
(20 citation statements)
references
References 34 publications
(56 reference statements)
0
20
0
Order By: Relevance
“…Consequently, many advice-taking systems combine different learning modalities in order to balance between autonomy and control. For example, RL can be augmented with evaluative feedback (Judah et al, 2010 ; Sridharan, 2011 ; Knox and Stone, 2012b ), corrective feedback (Celemin et al, 2019 ), instructions (Maclin and Shavlik, 1996 ; Kuhlmann et al, 2004 ; Rosenstein et al, 2004 ; Pradyot et al, 2012b ), instructions and evaluative feedback (Najar et al, 2020b ), demonstrations (Taylor et al, 2011 ; Subramanian et al, 2016 ), demonstrations and evaluative feedback (Leon et al, 2011 ), or demonstrations, evaluative feedback, and instructions (Tenorio-Gonzalez et al, 2010 ). Demonstrations can be augmented with corrective feedback (Chernova and Veloso, 2009 ; Argall et al, 2011 ), instructions (Rybski et al, 2007 ), instructions and feedback, both evaluative and corrective (Nicolescu and Mataric, 2003 ), or with prior RL (Syed and Schapire, 2007 ).…”
Section: Discussionmentioning
confidence: 99%
“…Consequently, many advice-taking systems combine different learning modalities in order to balance between autonomy and control. For example, RL can be augmented with evaluative feedback (Judah et al, 2010 ; Sridharan, 2011 ; Knox and Stone, 2012b ), corrective feedback (Celemin et al, 2019 ), instructions (Maclin and Shavlik, 1996 ; Kuhlmann et al, 2004 ; Rosenstein et al, 2004 ; Pradyot et al, 2012b ), instructions and evaluative feedback (Najar et al, 2020b ), demonstrations (Taylor et al, 2011 ; Subramanian et al, 2016 ), demonstrations and evaluative feedback (Leon et al, 2011 ), or demonstrations, evaluative feedback, and instructions (Tenorio-Gonzalez et al, 2010 ). Demonstrations can be augmented with corrective feedback (Chernova and Veloso, 2009 ; Argall et al, 2011 ), instructions (Rybski et al, 2007 ), instructions and feedback, both evaluative and corrective (Nicolescu and Mataric, 2003 ), or with prior RL (Syed and Schapire, 2007 ).…”
Section: Discussionmentioning
confidence: 99%
“…Another line of work is to consider human prior knowledge of task decomposition to achieve a form of curriculum learning for more complex tasks (Wang et al, 2020 ). Human input to RL has also been used in combination with policy search methods and to improve robot skills on a trajectory level (Celemin and Ruiz-del Solar, 2016 , 2019 ; Celemin et al, 2019 ). This is also very relevant for robotic applications, however, it should be noted that in this paper we focus only on the sequencing of skills as high-level actions.…”
Section: Related Workmentioning
confidence: 99%
“…For example, RL can be augmented with evaluative feedback [51,106,60], corrective feedback [20], instructions [79,65,102,99], instructions and evaluative feedback [90], demonstrations [112,109], demonstrations and evaluative feedback [66], or demonstrations, evaluative feedback and instructions [114]. Demonstrations can be augmented with corrective feedback [25,6], instructions [103], instructions and feedback, both evaluative and corrective [95], or with prior Reinforcement Learning [111].…”
Section: Toward a Unified Viewmentioning
confidence: 99%