2019
DOI: 10.48550/arxiv.1911.01546
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Abstract: While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…Dalal et al (2018) et al, 2000) is commonly used in quantitative finance, which aims to maximize returns in the worst α% of cases. This allows the agent to ensure that it learns safe policies for deployment that achieve high reward under the aleatoric uncertainty of the MDP (Tang et al, 2019;Keramati et al, 2019;Tamar et al, 2014;Kalashnikov et al, 2018;Borkar & Jain, 2010;Chow & Ghavamzadeh, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…Dalal et al (2018) et al, 2000) is commonly used in quantitative finance, which aims to maximize returns in the worst α% of cases. This allows the agent to ensure that it learns safe policies for deployment that achieve high reward under the aleatoric uncertainty of the MDP (Tang et al, 2019;Keramati et al, 2019;Tamar et al, 2014;Kalashnikov et al, 2018;Borkar & Jain, 2010;Chow & Ghavamzadeh, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…An additional approach is Bayesian and distributional RL (Bellemare et al, 2017), which seeks to track a full posterior over returns. These approaches benefit from the fact that with access to a full distribution, one may define risk specifically, with, e.g., conditional value at risk (CVaR) (Keramati et al, 2019). One limitation is that succinctly parameterizing the value distribution intersects with approximate Bayesian computation, an active area of research (Yang et al, 2019).…”
Section: Introductionmentioning
confidence: 99%