2014
DOI: 10.1007/978-3-319-11439-2_10
|View full text |Cite
|
Sign up to set email alerts
|

Reachability in MDPs: Refining Convergence of Value Iteration

Abstract: Abstract. Markov Decision Processes (MDP) are a widely used model including both non-deterministic and probabilistic choices. Minimal and maximal probabilities to reach a target set of states, with respect to a policy resolving non-determinism, may be computed by several methods including value iteration. This algorithm, easy to implement and efficient in terms of space complexity, consists in iteratively finding the probabilities of paths of increasing length. However, it raises three issues: (1) defining a s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
81
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(81 citation statements)
references
References 6 publications
0
81
0
Order By: Relevance
“…Practical implementations, however, are often based on numerical methods that only approximate the correct solution. In fact, methods based on value iteration-the de-facto standard in MDP model checking-do not give any guarantee on the accuracy of the obtained result [26]. We therefore consider interval iteration [5,9] which for a predefined precision ε > 0 guarantees that the obtained result x s is ε-precise, i.e.…”
Section: Definition 8 the Epoch Model Of Mdp M As Inmentioning
confidence: 99%
“…Practical implementations, however, are often based on numerical methods that only approximate the correct solution. In fact, methods based on value iteration-the de-facto standard in MDP model checking-do not give any guarantee on the accuracy of the obtained result [26]. We therefore consider interval iteration [5,9] which for a predefined precision ε > 0 guarantees that the obtained result x s is ε-precise, i.e.…”
Section: Definition 8 the Epoch Model Of Mdp M As Inmentioning
confidence: 99%
“…Guaranteed -close results could be achieved at the cost of precomputing and reducing a maximal end component decomposition of the MDP [7]. In this paper, we thus write VI to refer to an ideal -correct algorithm, but for the sake of comparison use the standard implementation in our experiments in Sect.…”
Section: Definitionmentioning
confidence: 99%
“…, LP i−1 with r being the maximal reward that occurs in the MDP. Since LP does not scale to large MDP [7], the technique has been reconsidered using value iteration instead [14]. Using the transformations and assumptions introduced above, we can formulate it as in Algorithm 1.…”
Section: Sequential Value Iterationsmentioning
confidence: 99%
“…Currently, the most widely used algorithm to model check MDPs in practice is value iteration (see , e.g., [Kwiatkowska et al, 2011;Haddad and Monmege, 2014]). The known upper bound on the number of iterations required for the value obtained to satisfy any formal guarantees is polynomial in the size of the representation of the smallest transition probability of the MDP.…”
Section: Model Checking Experimentsmentioning
confidence: 99%