Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/884
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging Human Guidance for Deep Reinforcement Learning Tasks

Abstract: Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. Human knowledge of how to solve these tasks can be incorporated using imitation learning, where the agent learns to imitate human demonstrated decisions. However, human guidance is not limited to the demonstrations. Other types of guidance could be more suitable for certain tasks and require less human effort. This survey provides a high-level overview of five recent learning frameworks that primaril… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
26
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 45 publications
(26 citation statements)
references
References 57 publications
0
26
0
Order By: Relevance
“…The policy can be learned through trial and error (RL) or from an expert's demonstration (IL). A major issue of RL is its sample inefficiency and human demonstration has been shown to speed up learning (Silver et al 2016;Hester et al 2018;de la Cruz, Du, and Taylor 2018;Zhang et al 2019).…”
Section: Introductionmentioning
confidence: 99%
“…The policy can be learned through trial and error (RL) or from an expert's demonstration (IL). A major issue of RL is its sample inefficiency and human demonstration has been shown to speed up learning (Silver et al 2016;Hester et al 2018;de la Cruz, Du, and Taylor 2018;Zhang et al 2019).…”
Section: Introductionmentioning
confidence: 99%
“…IRL has become an important apprenticeship approach to speed up convergence in classic RL problems utilizing expert and non-expert knowledge, experience and preferences [25]. In this context, four types of human guidance have been identified [27]: (i) standard imitation learning, in which the human trainer observes state information and demonstrates action to the agent, while the agent stores this data to be used in learning later; (ii) learning from evaluative feedback, in which the human trainer watches the agent performing the task, and provides instant feedback on agent decision in the associated state; (iii) imitation from observation, in which, in contrast to standard imitation, the agent does not have access to human demonstrated action; and, (iv) learning attention from human, where the trainer provides attention map to the learning agent.…”
Section: B Interactive Reinforcement Learningmentioning
confidence: 99%
“…One of the main challenges in this approach is to interpret human feedback correctly since such interpretation determines how the feedback is used to improve the policy in the MDP framework [25]. The main categories of methods used for incorporating evaluative human feedback in IRL are [27]: Policy Shaping, Reward Shaping, Intervention, and Policy-dependent Feedback.…”
Section: B Interactive Reinforcement Learningmentioning
confidence: 99%
“…2: The increasing research interest in the human-inthe-loop, obtained through Google scholar search with keywords: "human-in-the-loop" and "machine learning". such as clinical diagnosis and lack of training data [33,34,35,36].…”
Section: Introductionmentioning
confidence: 99%