2004
DOI: 10.1007/978-3-642-55884-9_3
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Solutions for Undiscounted Variance Penalized Markov Decision Chains

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2004
2004
2006
2006

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…Huang and Kallenberg [4] unified and extended the formulations and existence results obtained in [3], [10], and [18]. Finally, in [16] it was shown that optimal policies with respect to the standard mean-variance optimality criteria can be found in vertices of a special convex polyhedron, and a policy iteration method was suggested to find these vertices. Here it is important to note that, in these papers for finding the optimum policy with respect to various mean-variance optimality criteria, the 'variance' is considered only with respect to one-stage reward variances and not as the variance of the cumulative reward.…”
Section: Consideredmentioning
confidence: 92%
See 1 more Smart Citation
“…Huang and Kallenberg [4] unified and extended the formulations and existence results obtained in [3], [10], and [18]. Finally, in [16] it was shown that optimal policies with respect to the standard mean-variance optimality criteria can be found in vertices of a special convex polyhedron, and a policy iteration method was suggested to find these vertices. Here it is important to note that, in these papers for finding the optimum policy with respect to various mean-variance optimality criteria, the 'variance' is considered only with respect to one-stage reward variances and not as the variance of the cumulative reward.…”
Section: Consideredmentioning
confidence: 92%
“…Research in this direction was initiated by Mandl [12], Jaquette [5], [6], [8], Benito [1], and Sobel [17]. More recent extensions of these results can be found in [11], [9], and [16]. In particular, in these references the variance (or second moment) of the total expected discounted or average rewards of controlled, discretetime Markov reward chains was considered, to determine the 'best' policy within the class of discounted (or average) optimal policies and find a smaller variance (or lower second moment) of the cumulative reward.…”
Section: Motivationmentioning
confidence: 99%
“…Research in this direction was initiated by Mandl [12], Jaquette [5], [6], [8], Benito [1], and Sobel [17]. More recent extensions of these results can be found in [11], [9], and [16]. In particular, in these references the variance (or second moment) of the total expected discounted or average rewards of controlled, discretetime Markov reward chains was considered, to determine the 'best' policy within the class of discounted (or average) optimal policies and find a smaller variance (or lower second moment) of the cumulative reward.…”
Section: Motivationmentioning
confidence: 93%