2013
DOI: 10.1145/2432622.2432623
|View full text |Cite
|
Sign up to set email alerts
|

Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Abstract: Ye showed recently that the simplex method with Dantzig pivoting rule, as well as Howard's policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at mostiterations, where n is the number of states, m is the total number of actions in the MDP, and 0 < γ < 1 is the discount factor. We improve Ye's analysis in two respects. First, we improve the bound given by Ye and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
81
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 75 publications
(81 citation statements)
references
References 35 publications
0
81
0
Order By: Relevance
“…[11] and [24] improved the iteration bound to O( Recent developments [18,19] showed that linear programs can be solved inÕ( rank(A)) number of linear system solves, which, applied to DMDP, leads to a running time ofÕ(|S| 2.5 |A|L) and O(|S| 2.5 |A| log(M/((1 − γ) ))) (see Appendix B for a derivation).…”
Section: Previous Workmentioning
confidence: 99%
“…[11] and [24] improved the iteration bound to O( Recent developments [18,19] showed that linear programs can be solved inÕ( rank(A)) number of linear system solves, which, applied to DMDP, leads to a running time ofÕ(|S| 2.5 |A|L) and O(|S| 2.5 |A| log(M/((1 − γ) ))) (see Appendix B for a derivation).…”
Section: Previous Workmentioning
confidence: 99%
“…The paper [7] solves a considerable generalization of the Markov decision problem with complexity similar to that of [15]. Here one assumes that A and c are as in a Markov decision problem.…”
Section: Theorem 3 If the Optimal Value Of The Linear Program L P(a) mentioning
confidence: 99%
“…Section 3 discusses the Markov decision problem. The complexity results of [15] and [7] have the term 1 − γ in the denominator. This leads us to look for equivalent Markov decision problems for which this term is as large as possible.…”
mentioning
confidence: 99%
“…For total-and average-reward MDPs, the largest known lower bound is also exponential and has recently been found by Fearnley through a carefully built family of examples [6], based on a construction for parity games that was proposed by Friedmann [9]. This was a breakthrough after more than 25 years of research on the question of the complexity of PI [11], [15]. The story seems different though for discounted-reward MDPs for which a strongly polynomial upper bound has recently been found by Ye [20], yet only for a fixed discount factor; PI is shown to run in at most n 2 (k−1) 1−λ · log n 2 1−λ iterations in that case.…”
Section: Introductionmentioning
confidence: 99%
“…The story seems different though for discounted-reward MDPs for which a strongly polynomial upper bound has recently been found by Ye [20], yet only for a fixed discount factor; PI is shown to run in at most n 2 (k−1) 1−λ · log n 2 1−λ iterations in that case. This bound was later improved by a factor n by Hansen et al [11] and adapted to two-player turn-based zero-sum games, a natural two-player extension of MDPs for which PI also applies. Thereby, they provided the first (strongly) polynomial time algorithm for this latter class of problems.…”
Section: Introductionmentioning
confidence: 99%