2021
DOI: 10.48550/arxiv.2112.13414
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reinforcement Learning with Dynamic Convex Risk Measures

Abstract: We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 27 publications
0
8
0
Order By: Relevance
“…In this section, we validate our proposed framework on two benchmark applications. We apply our actor-critic algorithm on a statistical arbitrage example in Subsection 7.1 and recover results from Coache and Jaimungal (2021). We also explore a portfolio allocation problem and solve it using our model-agnostic approach in Subsection 7.2.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…In this section, we validate our proposed framework on two benchmark applications. We apply our actor-critic algorithm on a statistical arbitrage example in Subsection 7.1 and recover results from Coache and Jaimungal (2021). We also explore a portfolio allocation problem and solve it using our model-agnostic approach in Subsection 7.2.…”
Section: Methodsmentioning
confidence: 99%
“…Instead of assuming that the one-step conditional risk measures ρ t are convex (see e.g. Coache and Jaimungal, 2021) or coherent (see e.g. Ruszczyński, 2010;Tamar et al, 2016), we impose stronger properties to focus on a narrower class of risk measures, so that we can develop more efficient learning methodologies that do not require nested simulations.…”
Section: Dynamic Risk Settingmentioning
confidence: 99%
See 2 more Smart Citations
“…They also require the underlying MDP to exhibit a certain strong continuous/semi-continuous transition mechanism. [15] develops a computational approach for optimization with dynamic convex risk measures using deep learning techniques. Finally, it is noteworthy that the concept of risk form is introduced in [17] and is applied to handle two-stage MDP with partial information and decision-dependent observation distribution.…”
mentioning
confidence: 99%