Robotics: Science and Systems XIII 2017
DOI: 10.15607/rss.2017.xiii.069
|View full text |Cite
|
Sign up to set email alerts
|

Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models

Abstract: Abstract-The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for an expert's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk metrics, which … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(58 citation statements)
references
References 44 publications
0
58
0
Order By: Relevance
“…One possibility is to learn a distortion risk metric that explains how humans evaluate risk in the given application domain and then employ the learned risk metric. We describe first steps towards this in [20], where we have introduced a framework for risk-sensitive inverse reinforcement learning for learning humans' risk preferences from the class of coherent risk metrics.…”
Section: Discussionmentioning
confidence: 99%
“…One possibility is to learn a distortion risk metric that explains how humans evaluate risk in the given application domain and then employ the learned risk metric. We describe first steps towards this in [20], where we have introduced a framework for risk-sensitive inverse reinforcement learning for learning humans' risk preferences from the class of coherent risk metrics.…”
Section: Discussionmentioning
confidence: 99%
“…Since such instrumentation needs to be done for any new task that we may wish to learn, it poses a significant bottleneck to widespread adoption of reinforcement learning for robotics, and precludes the use of these methods directly in open-world environments that lack this instrumentation. Data-driven approaches for reward specification [29,1,11,17,12,48,33,9,27,4,18] seek to overcome this issue, but typically require demonstration data to acquire rewards. Such data can be onerous and time-consuming for users to provide.…”
Section: Related Workmentioning
confidence: 99%
“…Consequently, the predicted trajectories might be too conservative to reflect potential dangers in some situations. Although recent works such as [25] and [26] have introduced some "irrational" behaviors into planning-based approaches, it is still not sufficient to cover all human driving behaviors. Moreover, to obtain a solvable and interpretable planning problem, the learned reward/cost functions are typically linear combinations of features.…”
Section: A Motivationmentioning
confidence: 99%