Optimal Solutions for Undiscounted Variance Penalized Markov Decision Chains

Sladký, Karel; Sitař, Milan

doi:10.1007/978-3-642-55884-9_3

“…Huang and Kallenberg [4] unified and extended the formulations and existence results obtained in [3], [10], and [18]. Finally, in [16] it was shown that optimal policies with respect to the standard mean-variance optimality criteria can be found in vertices of a special convex polyhedron, and a policy iteration method was suggested to find these vertices. Here it is important to note that, in these papers for finding the optimum policy with respect to various mean-variance optimality criteria, the 'variance' is considered only with respect to one-stage reward variances and not as the variance of the cumulative reward.…”

Section: Consideredmentioning

confidence: 92%

“…Research in this direction was initiated by Mandl [12], Jaquette [5], [6], [8], Benito [1], and Sobel [17]. More recent extensions of these results can be found in [11], [9], and [16]. In particular, in these references the variance (or second moment) of the total expected discounted or average rewards of controlled, discretetime Markov reward chains was considered, to determine the 'best' policy within the class of discounted (or average) optimal policies and find a smaller variance (or lower second moment) of the cumulative reward.…”

Section: Motivationmentioning

confidence: 99%

On the total reward variance for continuous-time Markov reward chains

Dijk

¹

,

Sladký

²

2006

Self Cite

View full text Add to dashboard Cite

show abstract

“…Research in this direction was initiated by Mandl [12], Jaquette [5], [6], [8], Benito [1], and Sobel [17]. More recent extensions of these results can be found in [11], [9], and [16]. In particular, in these references the variance (or second moment) of the total expected discounted or average rewards of controlled, discretetime Markov reward chains was considered, to determine the 'best' policy within the class of discounted (or average) optimal policies and find a smaller variance (or lower second moment) of the cumulative reward.…”

Section: Motivationmentioning

confidence: 93%