2012 IEEE 51st IEEE Conference on Decision and Control (CDC) 2012
DOI: 10.1109/cdc.2012.6426504
|View full text |Cite
|
Sign up to set email alerts
|

Loss bounds for uncertain transition probabilities in Markov decision processes

Abstract: Abstract-We analyze losses resulting from uncertain transition probabilities in Markov decision processes with bounded nonnegative rewards. We assume that policies are precomputed using exact dynamic programming with the estimated transition probabilities, but the system evolves according to different, true transition probabilities. Given a bound on the total variation error of estimated transition probability distributions, we derive upper bounds on the loss of expected total reward. The approach analyzes the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 17 publications
(15 reference statements)
0
8
0
Order By: Relevance
“…Instead of adopting existing approaches in non-linearity measure, SNM adopts the approach commonly used for sensitivity analysis [21,22] of Markov Decision Processes (MDP) -a special class of POMDP where uncertainty is only in the effect of performing actions. It is based on the statistical distance measure between the true transition dynamics and its perturbed versions.…”
Section: Related Workmentioning
confidence: 99%
“…Instead of adopting existing approaches in non-linearity measure, SNM adopts the approach commonly used for sensitivity analysis [21,22] of Markov Decision Processes (MDP) -a special class of POMDP where uncertainty is only in the effect of performing actions. It is based on the statistical distance measure between the true transition dynamics and its perturbed versions.…”
Section: Related Workmentioning
confidence: 99%
“…We now show that the proposed algorithm converges to the optimal long-term average cost. This analysis follows from the favorable redundancy properties of the CTW probability estimates, and some results from the theory of exact dynamic programming with estimated transition probabilities [53]. The basic idea is to define criteria for transition probability and cost-to-go estimates for which acting according to our estimated values will be equivalent to acting on P and J respectively.…”
Section: A Asymptotic Analysismentioning
confidence: 99%
“…The existence of ε follows immediately from the fact that the dynamic programming operator is a contraction mapping [49, Proposition 4.1] and the finiteness of the state-action space. The optimality of acting on these estimates follows from the assumption of bounded cost per stage [53,Theorem 2].…”
Section: A Asymptotic Analysismentioning
confidence: 99%
“…SNM is designed to address these issues. Instead of building upon existing non-linearity measures, SNM adopts approaches commonly used for sensitivity analysis [23], [24] of Markov Decision Processes (MDP) -a special class of POMDP where the observation model is perfect, and therefore the system is fully observable. These approaches use statistical distance measures between the original transition dynamics and their perturbed versions.…”
Section: B Related Work On Non-linearity Measuresmentioning
confidence: 99%