Reachability in MDPs: Refining Convergence of Value Iteration

Haddad, Serge; Monmege, Benjamin

doi:10.1007/978-3-319-11439-2_10

Cited by 52 publications

(81 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Practical implementations, however, are often based on numerical methods that only approximate the correct solution. In fact, methods based on value iteration-the de-facto standard in MDP model checking-do not give any guarantee on the accuracy of the obtained result [26]. We therefore consider interval iteration [5,9] which for a predefined precision ε > 0 guarantees that the obtained result x s is ε-precise, i.e.…”

Section: Definition 8 the Epoch Model Of Mdp M As Inmentioning

confidence: 99%

Multi-cost Bounded Reachability in MDP

Hartmanns

Junges

Katoen

et al. 2018

Tools and Algorithms for the Construction and Analysis of Systems

View full text Add to dashboard Cite

Abstract. We provide an efficient algorithm for multi-objective modelchecking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. Reachability and expected cost objectives are covered and can be mixed. Empirical evaluation shows the algorithm's scalability. We discuss the need for output beyond Pareto curves and exploit the available information from the algorithm to support decision makers.

show abstract

Section: Definition 8 the Epoch Model Of Mdp M As Inmentioning

confidence: 99%

Multi-cost Bounded Reachability in MDP

Hartmanns

Junges

Katoen

et al. 2018

Tools and Algorithms for the Construction and Analysis of Systems

View full text Add to dashboard Cite

show abstract

“…Guaranteed -close results could be achieved at the cost of precomputing and reducing a maximal end component decomposition of the MDP [7]. In this paper, we thus write VI to refer to an ideal -correct algorithm, but for the sake of comparison use the standard implementation in our experiments in Sect.…”

Section: Definitionmentioning

confidence: 99%

“…, LP i−1 with r being the maximal reward that occurs in the MDP. Since LP does not scale to large MDP [7], the technique has been reconsidered using value iteration instead [14]. Using the transformations and assumptions introduced above, we can formulate it as in Algorithm 1.…”

Section: Sequential Value Iterationsmentioning

confidence: 99%

A Comparison of Time- and Reward-Bounded Probabilistic Model Checking Techniques

Hahn

Hartmanns

2016

Dependable Software Engineering: Theories, Tools, and Applications

View full text Add to dashboard Cite

In the design of probabilistic timed systems, requirements concerning behaviour that occurs within a given time or energy budget are of central importance. We observe that model-checking such requirements for probabilistic timed automata can be reduced to checking reward-bounded properties on Markov decision processes. This is traditionally implemented by unfolding the model according to the bound, or by solving a sequence of linear programs. Neither scales well to large models. Using value iteration in place of linear programming achieves scalability but accumulates approximation error. In this paper, we correct the value iteration-based scheme, present two new approaches based on scheduler enumeration and state elimination, and compare the practical performance and scalability of all techniques on a number of case studies from the literature. We show that state elimination can significantly reduce runtime for large models or high bounds.

show abstract

“…Currently, the most widely used algorithm to model check MDPs in practice is value iteration (see , e.g., [Kwiatkowska et al, 2011;Haddad and Monmege, 2014]). The known upper bound on the number of iterations required for the value obtained to satisfy any formal guarantees is polynomial in the size of the representation of the smallest transition probability of the MDP.…”

Section: Model Checking Experimentsmentioning

confidence: 99%

Reduction Techniques for Model Checking and Learning in MDPs

Bharadwaj

Roux

Pérez

et al. 2017

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Omega-regular objectives in Markov decision processes (MDPs) reduce to reachability: find a policy which maximizes the probability of reaching a target set of states. Given an MDP, an initial distribution, and a target set of states, such a policy can be computed by most probabilistic model checking tools. If the MDP is only partially specified, i.e., some probabilities are unknown, then model-learning techniques can be used to statistically approximate the probabilities and enable the computation of the desired policy. For fully specified MDPs, reducing the size of the MDP translates into faster model checking; for partially specified MDPs, into faster learning. We provide reduction techniques that allow us to remove irrelevant transition probabilities: transition probabilities (known, or to be learned) that do not influence the maximal reachability probability. Among other applications, these reductions can be seen as a pre-processing of MDPs before model checking or as a way to reduce the number of experiments required to obtain a good approximation of an unknown MDP.

show abstract

Reachability in MDPs: Refining Convergence of Value Iteration

Cited by 52 publications

References 6 publications

Multi-cost Bounded Reachability in MDP

Multi-cost Bounded Reachability in MDP

A Comparison of Time- and Reward-Bounded Probabilistic Model Checking Techniques

Reduction Techniques for Model Checking and Learning in MDPs

Contact Info

Product

Resources

About