1972
DOI: 10.1287/mnsc.18.7.356
|View full text |Cite
|
Sign up to set email alerts
|

Risk-Sensitive Markov Decision Processes

Abstract: This paper considers the maximization of certain equivalent reward generated by a Markov decision process with constant risk sensitivity. First, value iteration is used to optimize possibly time-varying processes of finite duration. Then a policy iteration procedure is developed to find the stationary policy with highest certain equivalent gain for the infinite duration case. A simple example demonstrates both procedures.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
259
0
17

Year Published

1993
1993
2017
2017

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 432 publications
(305 citation statements)
references
References 1 publication
2
259
0
17
Order By: Relevance
“…Barz and Waldmann [6] introduce an exponential utility function and employ the results of Howard and Matheson [25] in order to derive an opti-mal policy for this approach. An exponential utility function has the form u γ (x) = − exp(−γx) with positive parameter γ determining the level of risk aversion.…”
Section: Exponential Utility Functionmentioning
confidence: 99%
“…Barz and Waldmann [6] introduce an exponential utility function and employ the results of Howard and Matheson [25] in order to derive an opti-mal policy for this approach. An exponential utility function has the form u γ (x) = − exp(−γx) with positive parameter γ determining the level of risk aversion.…”
Section: Exponential Utility Functionmentioning
confidence: 99%
“…In relation to other risk-sensitive approaches, such as the ones that use exponential utility functions [Howard and Matheson, 1972] [Coraluppi and Marcus, 1999] and expected value-minus-variance-criterion [Heger, 1994], we do not need the transition probabilities, since we use a model-free approach. Moreover, we are also able to vary the daring factor during the life of the agent, since the learnt Q values do not depend on this factor.…”
Section: Discussionmentioning
confidence: 99%
“…There are several risk-sensitive approaches, such as the ones introduced by Howard and Matheson [Howard and Matheson, 1972] and Coraluppi and Marcus [Coraluppi and Marcus, 1999]. Both proposals make use of exponential utility functions.…”
Section: Related Workmentioning
confidence: 99%
“…The ETC and the EDC were first studied in, e.g., [18,20,31,32,33] for the finite state space model. Results in [18,20,31,33], characterizing the (exponential) optimal value function and policies are extended in Chapter 3, to infinite state space models.…”
Section: Summary Of Resultsmentioning
confidence: 99%
“…Results in [18,20,31,33], characterizing the (exponential) optimal value function and policies are extended in Chapter 3, to infinite state space models. Moreover, optimization with respect to the set of randomized, history de pendent policies is considered here, whereas only Markovian deterministic policies were considered ia [18,20,31,32,33]; see also [7] for related results. In Section 3.1, we present the exponential version of the policy evaluation algorithm (over a finite horizon), which plays a key role in the rest of the chapter.…”
Section: Summary Of Resultsmentioning
confidence: 99%