2020
DOI: 10.1007/s10458-020-09459-6
|View full text |Cite
|
Sign up to set email alerts
|

Interactively shaping robot behaviour with unlabeled human instructions

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
52
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(52 citation statements)
references
References 32 publications
0
52
0
Order By: Relevance
“…These algorithms will be presented in III-C. While other (more sophisticated) methods like Policy Shaping and Value Shaping [9], [10], [40], [45] to integrate Human feedback into the classical RL formulation exist, Reward Shaping will suffice as proof of concept for the feasibility of our approach. Ng et al [44] describe the necessary requirements for Reward Shaping to preserve the optimal policy, if these requirements are not met positivereward cycles can occur.…”
Section: A Interactive Rlmentioning
confidence: 99%
“…These algorithms will be presented in III-C. While other (more sophisticated) methods like Policy Shaping and Value Shaping [9], [10], [40], [45] to integrate Human feedback into the classical RL formulation exist, Reward Shaping will suffice as proof of concept for the feasibility of our approach. Ng et al [44] describe the necessary requirements for Reward Shaping to preserve the optimal policy, if these requirements are not met positivereward cycles can occur.…”
Section: A Interactive Rlmentioning
confidence: 99%
“…In this way, new items can be actively loaded in WM and therefore the agent can be trained to maintain novel sensory representations on-the-fly [ 116 ]. Moreover, the keypresses used here minimize the efforts required to make arbitrary changes in the elicited stimuli [ 117 ]. Interactions as such provide external observers some time to interpret the behaviour of the robot, and form a real-time hypothesis of its forthcoming behaviour.…”
Section: Discussionmentioning
confidence: 99%
“…The means by which teaching signals can be communicated to a learning agent vary. They can be provided via natural language (Kuhlmann et al, 2004 ; Cruz et al, 2015 ; Paléologue et al, 2018 ), computer vision (Atkeson and Schaal, 1997 ; Najar et al, 2020b ), hand-written programs (Maclin and Shavlik, 1996 ; Maclin et al, 2005a , b ; Torrey et al, 2008 ), artificial interfaces (Abbeel et al, 2010 ; Suay and Chernova, 2011 ; Knox et al, 2013 ), or physical interaction (Lozano-Perez, 1983 ; Akgun et al, 2012 ). Despite the variety of communication channels, we can distinguish two main categories of teaching signals based on how they are produced: advice and demonstration.…”
Section: Reinforcement Learning With Human Advicementioning
confidence: 99%
“…Despite the variety of communication channels, we can distinguish two main categories of teaching signals based on how they are produced: advice and demonstration. Even though advice and demonstration can share the same communication channels, like computer vision (Atkeson and Schaal, 1997 ; Najar et al, 2020b ) and artificial interfaces (Abbeel et al, 2010 ; Suay and Chernova, 2011 ; Knox et al, 2013 ), they are fundamentally different from each other in that demonstration requires the task to be executed by the teacher (demonstrated), while advice does not. In rare cases, demonstration (Whitehead, 1991 ; Lin, 1992 ) has been referred to as advice (Maclin and Shavlik, 1996 ; Maclin et al, 2005a ).…”
Section: Reinforcement Learning With Human Advicementioning
confidence: 99%
See 1 more Smart Citation