2018
DOI: 10.1177/0278364918772017
|View full text |Cite
|
Sign up to set email alerts
|

Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods

Abstract: The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(19 citation statements)
references
References 62 publications
0
19
0
Order By: Relevance
“…These normative approaches to adaptation in humanagent and human-robot teams develop explicit models of human decision-making in the context of a sharedreward Markov game with partial information, inferring the underlying reward function (i.e., goals and intents) of the human based on observed behavior; the robot or agent policy is then derived from a partially observable Markov decision process, POMDP, which maintains beliefs over the human's reward function [13]. Many of these models assume human policies are stationary, Boltzmann rational, and are generated with ideal understanding of the environmental dynamics.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…These normative approaches to adaptation in humanagent and human-robot teams develop explicit models of human decision-making in the context of a sharedreward Markov game with partial information, inferring the underlying reward function (i.e., goals and intents) of the human based on observed behavior; the robot or agent policy is then derived from a partially observable Markov decision process, POMDP, which maintains beliefs over the human's reward function [13]. Many of these models assume human policies are stationary, Boltzmann rational, and are generated with ideal understanding of the environmental dynamics.…”
Section: Related Workmentioning
confidence: 99%
“…Many of these models assume human policies are stationary, Boltzmann rational, and are generated with ideal understanding of the environmental dynamics. Recent research has been focused on relaxing these assumptions to better capture real-world human behaviors, such as considering non-stationary human rewards [14], mutual adaptation [15], risk-sensitivity [13], imperfect understanding of environment dynamics [9], or more representative models of rationality [16]. Normative approaches to human behavior suffer from several drawbacks that are relevant to our work.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Englert et al (2017) attempts to learn the cost function of a constrained optimization problem from optimal demonstrations by minimizing the residuals of the Karush–Kuhn–Tucker (KKT) conditions, but the constraints themselves are assumed known. On the other hand, a risk-sensitive approach to IRL is proposed in Singh et al (2018) and is complementary to our work, which aims to learn hard constraints. Another approach in Amin et al (2017) can represent a state-space constraint shared across tasks as a penalty term in the reward function of a Markov decision process.…”
Section: Related Workmentioning
confidence: 99%
“…where θ m n /θ m n are the upper/lower extents of dimension n of box m. We now modify the KKT conditions to handle the "or" constraints in (10). Primal feasibility (5c) changes to…”
Section: B Unions Of Offset-parameterized Constraintsmentioning
confidence: 99%
“…In contrast, our method explicitly learns the constraints. The risk-sensitive IRL approach in [10] also uses the KKT conditions, and is complementary to our work, which learns hard constraints. Perhaps the closest to our work is [11], which aims to recover a cost function and constraint simultaneously using the KKT conditions.…”
Section: Introductionmentioning
confidence: 99%