2017
DOI: 10.48550/arxiv.1701.06049
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Interactive Learning from Policy-Dependent Human Feedback

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
17
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(17 citation statements)
references
References 0 publications
0
17
0
Order By: Relevance
“…Human-in-the-Loop Policy Learning: Human-in-the-Loop Policy Learning allows a human to provide additional supervision during the policy learning process. One paradigm is Reinforcement Learning (RL) with human feedback [40], where a human provides rewards during agent training [8,11,24,25,34,36], but this suffers from the same limitations as IRL due to the need for extensive agent interaction.…”
Section: Related Workmentioning
confidence: 99%
“…Human-in-the-Loop Policy Learning: Human-in-the-Loop Policy Learning allows a human to provide additional supervision during the policy learning process. One paradigm is Reinforcement Learning (RL) with human feedback [40], where a human provides rewards during agent training [8,11,24,25,34,36], but this suffers from the same limitations as IRL due to the need for extensive agent interaction.…”
Section: Related Workmentioning
confidence: 99%
“…Our work relates closely to the growing literature of interactive reinforcement learning (RL), or humancentered RL [2,21,22,23,24,25,26,27,28,29] , in which agents learn from interactions with humans in addition to, or instead of, predefined environmental rewards. In the EMPATHIC framework, we use the term implicit human feedback to refer to any multi-modal evaluative signals humans naturally emit during social interactions, including facial expressions, tone of voice, head gestures, hand gestures and other body-language and vocalization modalities not aimed at explicit communication.…”
Section: Related Workmentioning
confidence: 99%
“…Preference learning. Much recent work has learned preferences from different sources of data, such as demonstrations (Ziebart et al, 2010;Ramachandran and Amir, 2007;Ho and Ermon, 2016;Fu et al, 2017;Finn et al, 2016), comparisons (Christiano et al, 2017Sadigh et al, 2017;Wirth et al, 2017), ratings (Daniel et al, 2014), human reinforcement signals (Knox and Stone, 2009;Warnell et al, 2017;MacGlashan et al, 2017), proxy rewards (Hadfield-Menell et al, 2017), etc. We suggest preference learning with a new source of data: the state of the environment when the robot is first deployed.…”
Section: Related Workmentioning
confidence: 99%