Risk-Sensitive Markov Control Processes

Shen, Yun; Stannat, Wilhelm; Obermayer, Klaus

doi:10.1137/120899005

Cited by 76 publications

(66 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In risk-sensitive sequential decision-making, the objective is to maximize a risk-sensitive criterion such as the expected exponential utility (Howard and Matheson 1972), a variance related measure (Sobel 1982;Filar et al 1989), the percentile performance (Filar et al 1995), or conditional value-at-risk (CVaR) (Ruszczyński 2010;Shen et al 2013). Unfortunately, when we include a measure of risk in our optimality criteria, the corresponding optimal policy is usually no longer Markovian stationary (e.g., Filar et al 1989) and/or computing it is not tractable (e.g., Filar et al 1989;Mannor and Tsitsiklis 2011).…”

Section: Introductionmentioning

confidence: 99%

Variance-constrained actor-critic algorithms for discounted and average reward MDPs

Prashanth

Ghavamzadeh

2016

Mach Learn

View full text Add to dashboard Cite

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounted and average reward Markov decision processes. For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize. For each of these criteria, we derive a formula for computing its gradient. We then devise actor-critic algorithms that operate on three timescales-a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale, and a dual ascent for Lagrange multipliers on the slowest timescale. In the discounted setting, we point out the difficulty in estimating the gradient of the variance of the return and incorporate simultaneous perturbation approaches to alleviate this. The average setting, on the other hand, allows for an actor update using compatible features to estimate the gradient of the variance. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.

show abstract

Section: Introductionmentioning

confidence: 99%

Variance-constrained actor-critic algorithms for discounted and average reward MDPs

Prashanth

Ghavamzadeh

2016

Mach Learn

View full text Add to dashboard Cite

show abstract

“…The risk-averse optimal trade strategy is then obtained by maximizing the following risk-averse objective [5]:…”

Section: Modelmentioning

confidence: 99%

“…Hence, it cannot measure the risk associated with μ s,a efficiently. For more detailed discussion, readers may refer to [5].…”

Section: Remarksmentioning

confidence: 99%

“…In this paper, we propose a general framework of riskaverse trading algorithms based on the risk-sensitive Markov decision processes (RS-MDP, [5], [6]) to solve a common high-frequency trade problem faced by an institutional trader: the optimal trade execution (see e.g., [7], [8], [9] and references therein), i.e., to liquid a huge inventory over a short time horizon. Specifically, we derive a risk-averse objective function within the framework of RS-MDP, which balances the expected revenue and the associated risk by applying a nonlinear transformation.…”

Section: Introductionmentioning

confidence: 99%

“…The authors also did experiments with an additional feature of order book, imbalance of depths and conclude that its effects on the performance of all RL algorithms are marginal 5. Our model can be extent to include the order size as another dimension of the action space, given an estimate for the market impact of limit order (see e.g.,[19]).…”

mentioning

confidence: 99%

See 2 more Smart Citations

Risk-averse reinforcement learning for algorithmic trading

Shen

Huang

Yan

et al. 2014

2014 IEEE Conference on Computational Intelligence for Financial Engineering &Amp; Economics (CIFEr)

Self Cite

View full text Add to dashboard Cite

We propose a general framework of risk-averse reinforcement learning for algorithmic trading. Our approach is tested in an experiment based on 1.5 years of millisecond timescale limit order data from NASDAQ, which contain the data around the 2010 flash crash. The results show that our algorithm outperforms the risk-neutral reinforcement learning algorithm by 1) keeping the trading cost at a substantially low level at the spot when the flash crash happened, and 2) significantly reducing the risk over the whole test period.

show abstract

Long run risk sensitive portfolio with general factors

Pitera

Stettner

2015

Math Meth Oper Res

View full text Add to dashboard Cite

In the paper portfolio optimization over long run risk sensitive criterion is considered. It is assumed that economic factors which stimulate asset prices are ergodic but non necessarily uniformly ergodic. Solution to suitable Bellman equation using local span contraction with weighted norms is shown. The form of optimal strategy is presented and examples of market models satisfying imposed assumptions are shown.

show abstract

Risk-Sensitive Markov Control Processes

Cited by 76 publications

References 61 publications

Variance-constrained actor-critic algorithms for discounted and average reward MDPs

Variance-constrained actor-critic algorithms for discounted and average reward MDPs

Risk-averse reinforcement learning for algorithmic trading

Long run risk sensitive portfolio with general factors

Contact Info

Product

Resources

About