Complexity of finite-horizon Markov decision process problems

Mundhenk, Martin; Goldsmith, Judy; Lusena, Christopher; Allender, Eric

doi:10.1145/347476.347480

Cited by 114 publications

(106 citation statements)

References 44 publications

Supporting

Mentioning

104

Contrasting

Order By: Relevance

“…The crucial assumption is here the presence of nodes with simultaneous actions. Without such nodes, the solving of such games is well known (see [11,13,12] for more on this).…”

Section: Games With Simultaneous Actions (Gsa)mentioning

confidence: 99%

“…exponential time, exponential space, doublyexponential time) for the fully observable, no observation, and partially observable case respectivelly for the criterion of deciding whether a 100% winning strategy exists 3 . With exponential horizon, the complexities decrease to EXP, NEXP, EXPSPACE respectively [11]. -With two players without random part, the problem of approximating the best winning probability that can be achieved regardless of the opponent strategy is undecidable [14] by reduction to the one-player randomized case above in the no observation case; the best complexity upper bounds for bounded horizon are 3EXP (for exponential horizon) and 2EXP (for polynomial horizon).…”

Section: Introductionmentioning

confidence: 99%

“…The existence of strategies winning with probability 1, independently of the opponent, is decidable for 2 players, even in partially observable environments (see [6], showing that this is not true if we have a team of 2 players against a third player). -The fully observable setting is always decidable, with complexity reduced by far in the case of limited horizon; see [13,11] for more on this for the case in which we consider the existence of strategies winning with probability 1 and [14] for the choice of optimal moves.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Upper Confidence Trees with Short Term Partial Information

Teytaud

Flory²

2011

Applications of Evolutionary Computation

View full text Add to dashboard Cite

show abstract

“…The crucial assumption is here the presence of nodes with simultaneous actions. Without such nodes, the solving of such games is well known (see [11,13,12] for more on this).…”

Section: Games With Simultaneous Actions (Gsa)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Upper Confidence Trees with Short Term Partial Information

Teytaud

Flory²

2011

Applications of Evolutionary Computation

View full text Add to dashboard Cite

show abstract

“…Unfortunately, obtaining the true H-horizon optimal value is often difficult, e.g., due to the large state space (see, e.g., [29] for a discussion of the complexity of solving finitehorizon MDPs). Motivated by this, we study an approximate receding horizon control that uses an approximate value function as an approximate solution of V * H−1 for some H < ∞.…”

Section: Receding Horizon Controlmentioning

confidence: 99%

Approximate Receding Horizon Approach for Markov Decision Processes: Average Award Case

Chang¹,

Marcus²

2002

View full text Add to dashboard Cite

ISR develops, applies and teaches advanced methodologies of design and analysis to solve complex, hierarchical, heterogeneous and dynamic problems of engineering technology and systems for industry and government. ISR is a permanent institute of the University of Maryland, within the Glenn L. Martin Institute of Technol AbstractWe consider an approximation scheme for solving Markov Decision Processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call "approximate receding horizon control". We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White [36]. We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation.

show abstract

“…In terms of computational complexity, optimally solving a finitehorizon Dec-POMDP is NEXP-Complete (Bernstein et al, 2002). In contrast, finite-horizon POMDPs are PSPACE-complete (Mundhenk, Goldsmith, Lusena, & Allender, 2000), a strictly lower complexity class that highlights the difficulty of solving Dec-POMDPs.…”

Section: Introductionmentioning

confidence: 99%

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Kumar

Zilberstein

Toussaint

2015

jair

View full text Add to dashboard Cite

Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models-NEXP-Complete even for two agents-has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.

show abstract

Complexity of finite-horizon Markov decision process problems

Cited by 114 publications

References 44 publications

Upper Confidence Trees with Short Term Partial Information

Upper Confidence Trees with Short Term Partial Information

Approximate Receding Horizon Approach for Markov Decision Processes: Average Award Case

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Contact Info

Product

Resources

About