2022
DOI: 10.1007/978-3-031-19992-9_18
|View full text |Cite
|
Sign up to set email alerts
|

Optimistic and Topological Value Iteration for Simple Stochastic Games

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 26 publications
0
11
0
Order By: Relevance
“…However, when we extracted the induced MDPs, we found them all easy for VI. Similarly, [3] used a random generation of SGs of at most 10,000 states, many of which were challenging for the SG algorithms. Yet the same random generation modified to produce MDPs delivered only MDPs easily solved in seconds, even with drastically increased numbers of states.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, when we extracted the induced MDPs, we found them all easy for VI. Similarly, [3] used a random generation of SGs of at most 10,000 states, many of which were challenging for the SG algorithms. Yet the same random generation modified to produce MDPs delivered only MDPs easily solved in seconds, even with drastically increased numbers of states.…”
Section: Discussionmentioning
confidence: 99%
“…This uniformity may be misleading. Indeed, for some stochastic game algorithms, using LP to solve the underlying MDPs may be preferential [3,Appendix E.4]. An application in runtime assurance preferred PI for numerical stability [45,Sect.…”
Section: Introductionmentioning
confidence: 99%
“…Kleene's fixpoint theorem suggests a simple method for approximating the lfp µφ from below : Simply iterate φ starting at 0, i.e., compute the sequence l 0 = 0, l 1 = φ(l 0 ), l 2 = φ(l 1 ), etc. 1 In the context of MDP, this iterative scheme is known as Value Iteration (VI). VI is easy to implement, but it is difficult to decide when to stop the iteration.…”
Section: The Optimistic Value Iteration Algorithmmentioning
confidence: 99%
“…In a nutshell, the idea of OVI is to compute some lower bound l on the solution-which can be done using an approximative iterative algorithm-and then optimistically guess an upper bound u = l + ε and verify that the guess was correct. Prior to our paper, OVI had only been considered in Markov Decision Processes (MDP) [22] and Stochastic Games (SG) [1], where it is used to compute bounds on, e.g., maximal reachability probabilities. The upper bounds computed by OVI have a special property: They are self-certifying (also called inductive in our paper): Given the system and the bounds, one can check very easily that the bounds are indeed correct.…”
Section: Introductionmentioning
confidence: 99%
“…This uniformity may be misleading. Indeed, for stochastic games and a particular technique, using LP to solve the underlying MDPs may be preferential [3,Appendix E.4]. For examples in runtime assurance, numerical instability meant that PI was preferred [32,Sect.…”
Section: Introductionmentioning
confidence: 99%