Reinforcement Learning with Dynamic Convex Risk Measures

Coache, Anthony; Jaimungal, Sebastian

doi:10.48550/arxiv.2112.13414

Cited by 4 publications

(8 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we validate our proposed framework on two benchmark applications. We apply our actor-critic algorithm on a statistical arbitrage example in Subsection 7.1 and recover results from Coache and Jaimungal (2021). We also explore a portfolio allocation problem and solve it using our model-agnostic approach in Subsection 7.2.…”

Section: Methodsmentioning

confidence: 99%

“…Instead of assuming that the one-step conditional risk measures ρ t are convex (see e.g. Coache and Jaimungal, 2021) or coherent (see e.g. Ruszczyński, 2010;Tamar et al, 2016), we impose stronger properties to focus on a narrower class of risk measures, so that we can develop more efficient learning methodologies that do not require nested simulations.…”

Section: Dynamic Risk Settingmentioning

confidence: 99%

“…A proposed methodology consists of approximating the value function at a certain state by simulating several transitions, which we refer to as nested approach (see e.g. Coache and Jaimungal, 2021). In that work, the authors develop a RL algorithm that estimates the value function by generating additional (inner) transitions for every visited state of an (outer) episode -e.g.…”

Section: Reinforcement Learningmentioning

confidence: 99%

“…However, these works require computing the value function for every possible state of the environment, limiting their applicability to problems with a small number of state-action pairs. A recent development in risk-aware RL is that in Coache and Jaimungal (2021). The authors use dynamic convex risk measures and devise a model-free approach to solve finite-horizon RL problems in a time-consistent manner.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning

Coache¹,

Jaimungal²,

Cartea³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a novel framework to solve risk-sensitive reinforcement learning (RL) problems where the agent optimises time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penalizers in the estimation procedure. Our contribution is threefold: we (i) devise an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks, (ii) prove that these dynamic spectral risk measures may be approximated to any arbitrary accuracy using deep neural networks, and (iii) develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions. We compare our conceptually improved reinforcement learning algorithm with the nested simulation approach and illustrate its performance in two settings: statistical arbitrage and portfolio allocation on both simulated and real data.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Dynamic Risk Settingmentioning

confidence: 99%

Section: Reinforcement Learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning

Coache¹,

Jaimungal²,

Cartea³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…They also require the underlying MDP to exhibit a certain strong continuous/semi-continuous transition mechanism. [15] develops a computational approach for optimization with dynamic convex risk measures using deep learning techniques. Finally, it is noteworthy that the concept of risk form is introduced in [17] and is applied to handle two-stage MDP with partial information and decision-dependent observation distribution.…”

mentioning

confidence: 99%

Markov decision processes with Kusuoka-type conditional risk mappings

Ziteng¹,

Jaimungal²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

The Kusuoka representation of proper lower semi-continuous law invariant coherent risk measures allows one to cast them in terms of average value-at-risk. Here, we introduce the notion of Kusuoka-type conditional risk-mappings and use it to define a dynamic risk measure. We use such dynamic risk measures to study infinite horizon Markov decision processes with random costs and random actions. Under mild assumptions, we derive a dynamic programming principle and prove the existence of an optimal policy. We also derive the Q-learning version of the dynamic programming principle, which is important for applications. Furthermore, we provide a sufficient condition for when deterministic actions are optimal.

show abstract

Recent advances in reinforcement learning in finance

Hambly

Yang

2023

Mathematical Finance

View full text Add to dashboard Cite

The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision‐making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value‐ and policy‐based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. We then discuss in detail the application of these RL algorithms in a variety of decision‐making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo‐advising. Our survey concludes by pointing out a few possible future directions for research.

show abstract

Reinforcement Learning with Dynamic Convex Risk Measures

Cited by 4 publications

References 27 publications

Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning

Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning

Markov decision processes with Kusuoka-type conditional risk mappings

Recent advances in reinforcement learning in finance

Contact Info

Product

Resources

About