The variance of discounted Markov decision processes

Sobel, Milton

doi:10.1017/s0021900200023123

Cited by 79 publications

(99 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…(c) Note that if the semi-Markov kernel Q(·, ·|x, a) is taken some particular forms, our model can be reduced to the corresponding one of CTMDPs [10,11,12,20] or of DTMDPs [6,24,28,30]; see Section 5 for further details.…”

Section: The Control Modelmentioning

confidence: 99%

“…The background of mean-variance problems arises from the tradeoff between the mean and variance, and the fact that a risk-aversion investor usually prefers to a return lower than the maximal one to keep a smaller variance risk. Due to this, mean-variance problems have been widely studied for various dynamic systems described by stochastic differential equations [5,7,22,31], Markov decision processes (MDPs) [2,3,8,10,13,21,27,28], and so on.…”

Section: Introductionmentioning

confidence: 99%

“…For the issue of mean-variance in MDPs, there have been a lot of references; see, [4,19,25,28] for the finite horizon reward variance; [6,10,12,20,28,30] for the infinite horizon discounted reward variance; [11,24,30] for the first passage variance; DOI: 10.14736/kyb-2017- and [2,6,8,9,13,14,21,27,29,32] for the limiting average variance. To the best of our knowledge, most of the aforementioned works in MDPs focus on solving mean-variance problems in discrete-time MDPs (DTMDPs) [3,4,6,13,14,21,24,25,28,30,32] as well as in continuous-time MDPs (CTMDPs) [8,9,10,11,12,20,27], nevertheless, only a few works address mean-variance problems in semi-Markov decision processes (SMDPs); see [2,28] for finite SMDPs and [19] with a finite time horizon.…”

Section: Introductionmentioning

confidence: 99%

“…To the best of our knowledge, most of the aforementioned works in MDPs focus on solving mean-variance problems in discrete-time MDPs (DTMDPs) [3,4,6,13,14,21,24,25,28,30,32] as well as in continuous-time MDPs (CTMDPs) [8,9,10,11,12,20,27], nevertheless, only a few works address mean-variance problems in semi-Markov decision processes (SMDPs); see [2,28] for finite SMDPs and [19] with a finite time horizon. Moreover, it should be noted that most of the existing works on mean-variance problems for MDPs deal with fixed finite or infinite time horizons.…”

Section: Introductionmentioning

confidence: 99%

“…We assume that the state and control sets are Borel spaces, while the reward rates are possibly unbounded from both above and below. The discount factor may depend on states and controls, which is an extension of the usual constant ones in previous studies [6,10,12,20,28] and just statedependent ones [30]. The consideration of a varying discount factor rather than a fixed constant one derives from the practical cases such as the interest rate in economic and financial systems [1,15,23], which can be adjusted according to the real circumstances.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Mean-variance optimality for semi-Markov decision processes under first passage criteria

Huang¹,

Huang²

2017

Kybernetika

View full text Add to dashboard Cite

Section: The Control Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Mean-variance optimality for semi-Markov decision processes under first passage criteria

Huang¹,

Huang²

2017

Kybernetika

View full text Add to dashboard Cite

Minimizing Risk Models in Markov Decision Processes with Policies Depending on Target Values

Lin

1999

Journal of Mathematical Analysis and Applications

View full text Add to dashboard Cite

This paper studies the minimizing risk problems in Markov decision processes with countable state space and reward set. The objective is to find a policy which Ž . minimizes the probability risk that the total discounted rewards do not exceed a Ž . specified value target . In this sort of model, the decision made by the decision maker depends not only on system's states, but also on his target values. By introducing the decision-maker's state, we formulate a framework for minimizing risk models. The policies discussed depend on target values and the rewards may be arbitrary real numbers. For the finite horizon model, the main results obtained Ž . Ž . are: i The optimal value functions are distribution functions of the target, ii Ž . there exists an optimal deterministic Markov policy, and iii a policy is optimal if and only if at each realizable state it always takes optimal action. In addition, we obtain a sufficient condition and a necessary condition for the existence of finite horizon optimal policy independent of targets and we give an algorithm computing finite horizon optimal policies and optimal value functions. For an infinite horizon model, we establish the optimality equation and we obtain the structure property of optimal policy. We prove that the optimal value function is a distribution function of target and we present a new approximation formula which is the generalization of the nonnegative rewards cases. An example which illustrates the mistakes of previous literature shows that the existence of optimal policy has not been proved really. In this paper, we give an existence condition, which is a sufficient and necessary condition for the existence of an infinite horizon optimal policy independent of targets, and we point out that whether there exists an optimal policy remains an open problem in the general case.

show abstract