Reinforcement learning of motor skills using Policy Search and human corrective advice

Celemin, Carlos; Maeda, Guilherme; Ruiz‐del‐Solar, Javier; Peters, Jan; Kober, Jens

doi:10.1177/0278364919871998

Cited by 22 publications

(20 citation statements)

References 34 publications

(56 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Consequently, many advice-taking systems combine different learning modalities in order to balance between autonomy and control. For example, RL can be augmented with evaluative feedback (Judah et al, 2010 ; Sridharan, 2011 ; Knox and Stone, 2012b ), corrective feedback (Celemin et al, 2019 ), instructions (Maclin and Shavlik, 1996 ; Kuhlmann et al, 2004 ; Rosenstein et al, 2004 ; Pradyot et al, 2012b ), instructions and evaluative feedback (Najar et al, 2020b ), demonstrations (Taylor et al, 2011 ; Subramanian et al, 2016 ), demonstrations and evaluative feedback (Leon et al, 2011 ), or demonstrations, evaluative feedback, and instructions (Tenorio-Gonzalez et al, 2010 ). Demonstrations can be augmented with corrective feedback (Chernova and Veloso, 2009 ; Argall et al, 2011 ), instructions (Rybski et al, 2007 ), instructions and feedback, both evaluative and corrective (Nicolescu and Mataric, 2003 ), or with prior RL (Syed and Schapire, 2007 ).…”

Section: Discussionmentioning

confidence: 99%

Reinforcement Learning With Human Advice: A Survey

Najar

Chétouani

2021

Front. Robot. AI

View full text Add to dashboard Cite

In this paper, we provide an overview of the existing methods for integrating human advice into a reinforcement learning process. We first propose a taxonomy of the different forms of advice that can be provided to a learning agent. We then describe the methods that can be used for interpreting advice when its meaning is not determined beforehand. Finally, we review different approaches for integrating advice into the learning process.

show abstract

Section: Discussionmentioning

confidence: 99%

Reinforcement Learning With Human Advice: A Survey

Najar

Chétouani

2021

Front. Robot. AI

View full text Add to dashboard Cite

show abstract

“…Another line of work is to consider human prior knowledge of task decomposition to achieve a form of curriculum learning for more complex tasks (Wang et al, 2020 ). Human input to RL has also been used in combination with policy search methods and to improve robot skills on a trajectory level (Celemin and Ruiz-del Solar, 2016 , 2019 ; Celemin et al, 2019 ). This is also very relevant for robotic applications, however, it should be noted that in this paper we focus only on the sequencing of skills as high-level actions.…”

Section: Related Workmentioning

confidence: 99%

Multi-Channel Interactive Reinforcement Learning for Sequential Tasks

et al. 2020

Self Cite

View full text Add to dashboard Cite

The ability to learn new tasks by sequencing already known skills is an important requirement for future robots. Reinforcement learning is a powerful tool for this as it allows for a robot to learn and improve on how to combine skills for sequential tasks. However, in real robotic applications, the cost of sample collection and exploration prevent the application of reinforcement learning for a variety of tasks. To overcome these limitations, human input during reinforcement can be beneficial to speed up learning, guide the exploration and prevent the choice of disastrous actions. Nevertheless, there is a lack of experimental evaluations of multi-channel interactive reinforcement learning systems solving robotic tasks with input from inexperienced human users, in particular for cases where human input might be partially wrong. Therefore, in this paper, we present an approach that incorporates multiple human input channels for interactive reinforcement learning in a unified framework and evaluate it on two robotic tasks with 20 inexperienced human subjects. To enable the robot to also handle potentially incorrect human input we incorporate a novel concept for self-confidence, which allows the robot to question human input after an initial learning phase. The second robotic task is specifically designed to investigate if this self-confidence can enable the robot to achieve learning progress even if the human input is partially incorrect. Further, we evaluate how humans react to suggestions of the robot, once the robot notices human input might be wrong. Our experimental evaluations show that our approach can successfully incorporate human input to accelerate the learning process in both robotic tasks even if it is partially wrong. However, not all humans were willing to accept the robot's suggestions or its questioning of their input, particularly if they do not understand the learning process and the reasons behind the robot's suggestions. We believe that the findings from this experimental evaluation can be beneficial for the future design of algorithms and interfaces of interactive reinforcement learning systems used by inexperienced users.

show abstract

“…For example, RL can be augmented with evaluative feedback [51,106,60], corrective feedback [20], instructions [79,65,102,99], instructions and evaluative feedback [90], demonstrations [112,109], demonstrations and evaluative feedback [66], or demonstrations, evaluative feedback and instructions [114]. Demonstrations can be augmented with corrective feedback [25,6], instructions [103], instructions and feedback, both evaluative and corrective [95], or with prior Reinforcement Learning [111].…”

Section: Toward a Unified Viewmentioning

confidence: 99%

Reinforcement learning with human advice: a survey

Najar¹,

Chétouani²

2020

Preprint

View full text Add to dashboard Cite

In this paper, we provide an overview of the existing methods for integrating human advice into a Reinforcement Learning process. We propose a taxonomy of different types of teaching signals, and present them according to three main aspects: how they can be provided to the learning agent, how they can be integrated into the learning process, and how they can be interpreted by the agent if their meaning is not determined beforehand. Finally, we compare the benefits and limitations of using each type of teaching signals, and propose a unified view of interactive learning methods.

show abstract

Reinforcement learning of motor skills using Policy Search and human corrective advice

Cited by 22 publications

References 34 publications

Reinforcement Learning With Human Advice: A Survey

Reinforcement Learning With Human Advice: A Survey

Multi-Channel Interactive Reinforcement Learning for Sequential Tasks

Reinforcement learning with human advice: a survey

Contact Info

Product

Resources

About