2010
DOI: 10.1609/aaai.v24i1.7690
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning Via Practice and Critique Advice

Abstract: We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on actual world experience, and end-user critique sessions where advice is gathered. During each critique session the end-user is allowed to analyze a trajectory of the current policy and then label an arbitrary subset of the available actions as good or bad. Our main contribution is an approach for integrating all of the information gathe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(7 citation statements)
references
References 9 publications
0
7
0
Order By: Relevance
“…A simulated human able to interact with the robot faces the table. We have defined two ways for the robot to learn from humans, drawing inspiration from the concepts of learning by evaluative feedback and learning by demonstration [Knox and Stone, 2009, Judah et al, 2010, Griffith et al, 2013. We name respectively the two types of underlying interventions: Intervention of the type congratulation and intervention of the type takeover.…”
Section: Simulated Humansmentioning
confidence: 99%
“…A simulated human able to interact with the robot faces the table. We have defined two ways for the robot to learn from humans, drawing inspiration from the concepts of learning by evaluative feedback and learning by demonstration [Knox and Stone, 2009, Judah et al, 2010, Griffith et al, 2013. We name respectively the two types of underlying interventions: Intervention of the type congratulation and intervention of the type takeover.…”
Section: Simulated Humansmentioning
confidence: 99%
“…Their systems have been shown to learn faster and with less feedback than other approaches. Interactive learning from demonstrations and instructions have also been shown to help teach different ways of behaving to a learning machine [86,88,[93][94][95][96][97].…”
Section: Interactive Approaches To Instruction Communication and Controlmentioning
confidence: 99%
“…This is the general form of supervised learning. In RL, it includes labeling sets of actions as good or bad (Judah et al 2010;Christiano et al 2017), learning from demonstrations (Ross and Bagnell 2010;Abbeel and Ng 2004;Ho et al 2016), and corrections to dialogue agents (Li et al 2016;Chen et al 2017). In our setting, imperative feedback specifies a counterfactual behavior: something the learner should (or should not) have done (e.g., "You should have gone to the living room.").…”
Section: Extracting Mdp Features From Languagementioning
confidence: 99%