Twenty-First International Conference on Machine Learning - ICML '04 2004
DOI: 10.1145/1015330.1015430
|View full text |Cite
|
Sign up to set email alerts
|

Apprenticeship learning via inverse reinforcement learning

Abstract: We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

9
2,277
1
6

Year Published

2011
2011
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 2,241 publications
(2,293 citation statements)
references
References 15 publications
9
2,277
1
6
Order By: Relevance
“…These two methods are Apprenticeship Learning (AL) [1] and Inverse Reinforcement Learning (IRL) [8]. In the AL framework, the agent tries to learn the expert policy or at least a policy which is as good as the expert policy (according to an unknown reward function).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…These two methods are Apprenticeship Learning (AL) [1] and Inverse Reinforcement Learning (IRL) [8]. In the AL framework, the agent tries to learn the expert policy or at least a policy which is as good as the expert policy (according to an unknown reward function).…”
Section: Introductionmentioning
confidence: 99%
“…AL can be reduced to classification [7,3,6,11] where the agent tries to mimic the expert policy via a Supervised Learning (SL) method such as classification. There exist also several AL algorithms inspired by IRL such as [1,10] but they need to solve recursively MDPs which is a difficult problem when the state space is large and the dynamics of the MDP is unknown.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we describe a method to take policy matrices learned on small instances and generalise them to apply to different instances, and with the particular aim to apply them to larger instances. We used a form of apprenticeship learning (a.k.a learning by demonstration or imitation learning) [1] for generalizing the demonstrations provided by an expert. Apprenticeship learning has a wide range of applications in control and robotics and is heavily based on Inverse Reinforcement Learning (IRL).…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…Shirai et al analyzed differences in ship behavior in Tokyo Bay by categorizing the sizes of ships and constructing a traffic flow network [6]. However, the model they created is not useful for real-time forecasting.…”
Section: Introductionmentioning
confidence: 99%