2018
DOI: 10.1609/aaai.v32i1.11755
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

Abstract: In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 20 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…Since LfD utilizes expert demonstrations, the robot can be better incentivized to stay within safe or relevant regions of the state space, especially when compared with techniques that require significant exploration, such as reinforcement learning. This is because demonstrations provide a way to assess the safety or risk associated with regions of the state space (e.g., [196][197][198]. Furthermore, several LfD methods provide and utilize measures of uncertainty associated with different parts of the state space (e.g., 62, 81, 100), enabling communication of the system's confidence to the user.…”
Section: Safe Learningmentioning
confidence: 99%
“…Since LfD utilizes expert demonstrations, the robot can be better incentivized to stay within safe or relevant regions of the state space, especially when compared with techniques that require significant exploration, such as reinforcement learning. This is because demonstrations provide a way to assess the safety or risk associated with regions of the state space (e.g., [196][197][198]. Furthermore, several LfD methods provide and utilize measures of uncertainty associated with different parts of the state space (e.g., 62, 81, 100), enabling communication of the system's confidence to the user.…”
Section: Safe Learningmentioning
confidence: 99%
“…In contrast, the ML-IRL is based on maximum likelihood estimation, which cannot incorporate prior knowledge and handle uncertainty. IRL with the Bayesian optimization method has been used to learn driving strategies [54], mobile robot navigation [55,56], and robot demonstrative learning [57] with good performance. Hierarchical BIRL extended on the original basis outperforms MaxEnt-IRL in cab driver route selection based on maps and GPS data [58].…”
Section: Bayesian Optimization Methodsmentioning
confidence: 99%
“…Furthermore, Brown et al [54] construct a sampling-based Bayesian IRL model, which utilizes expert trajectories to calculate practical high-confidence upper bounds on the αworst-case difference in expected return under the unseen scenarios without a reward function. Palan et al [55] propose DemPref model, which utilizes the expert trajectory to learn a coarse reward function, the trajectory is used to ground the (active) query generation process, to improve the quality of the generated queries.…”
Section: A Imitation Learningmentioning
confidence: 99%