Optimally Solving Dec-POMDPs as Continuous-State MDPs

Dibangoye, Jilles; Amato, Christopher; Buffet, Olivier; Charpillet, François

doi:10.1613/jair.4623

Cited by 62 publications

(78 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This intuition is correct [Nayyar et al, 2011, Dibangoye et al, 2013, MacDermed and Isbell, 2013. In particular, it is possible make a reduction to special type of POMDP: a non-observable MDP (a POMDP with just one 'NULL' observation).…”

Section: A Nomdp Formulationmentioning

confidence: 94%

“…However, it turns out that it is possible to replace the dependence on the past joint policy by a so-called plan-time sufficient statistic: a distribution over histories and states [Oliehoek et al, 2013a, Dibangoye et al, 2013. This is useful, since many past joint policies can potentially map to the same statistic, as indicated in Figure 4.5.…”

Section: Plan-time Sufficient Statisticsmentioning

confidence: 99%

See 1 more Smart Citation

A Concise Introduction to Decentralized POMDPs

Oliehoek

Amato

2016

SpringerBriefs in Intelligent Systems

Self Cite

562

233

View full text Add to dashboard Cite

Section: A Nomdp Formulationmentioning

confidence: 94%

Section: Plan-time Sufficient Statisticsmentioning

confidence: 99%

A Concise Introduction to Decentralized POMDPs

Oliehoek

Amato

2016

SpringerBriefs in Intelligent Systems

Self Cite

562

233

View full text Add to dashboard Cite

“…Recently, major strides in solving Dec-POMDPs have been made. In particular, it has been shown that there is a reduction from Dec-POMDP to a special type of centralized POMDP called a non-observable Markov decision process (NOMDP) (MacDermed and Isbell, 2013;Nayyar et al, 2013;Dibangoye et al, 2013;Oliehoek and Amato, 2014). This allows POMDP solution methods to be employed in the context of DecPOMDPs.…”

Section: Other Decision Problemsmentioning

confidence: 99%

Multi-objective decision-theoretic planning

Roijers

2016

AI Matters

View full text Add to dashboard Cite

Decision making is hard. It o en requires reasoning about uncertain environments, partial observability and action spaces that are too large to enumerate. In such complex decisionmaking tasks decision-theoretic agents, that can reason about their environments on the basis of mathematical models and produce policies that optimize the utility for their users, can o en assist us.In most research on decision-theoretic agents, the desirability of actions and their e ects is codi ed in a scalar reward function. However, many real-world decision problems have multiple objectives. In such cases the problem is more naturally expressed using a vector-valued reward function. Rather than having a single optimal policy, we then want to produce a set of policies that covers all possible preferences between the objectives. We call such a set a coverage set. In this dissertation, we focus on decision-theoretic planning algorithms that produce the convex coverage set (CCS), which is the optimal solution set when either: 1) the user utility can be expressed as a weighted sum over the values for each objective; or 2) policies can be stochastic.We propose new methods based on two popular approaches to creating planning algorithms that produce an (approximate) CCS by building on an existing single-objective algorithm. In the inner loop approach, we replace the summations and maximizations in the inner most loops of the single-objective algorithm by cross-sums and pruning operations. In the outer loop approach, we solve a multi-objective problem as a series of scalarized problems by employing the single-objective method as a subroutine.Our most important contribution is an outer loop framework that we call optimistic linear support (OLS). As an outer loop method OLS builds the CCS incrementally. We show that, contrary to existing outer loop methods, each intermediate result is a bounded approximation of the CCS with known bounds (even when the single-objective method used is a bounded approximate method as well) and is guaranteed to terminate in a nite number of iterations.We apply OLS-based algorithms to a variety of multi-objective decision problems, and show that it is more memory-e cient, and faster than corresponding inner loop algorithms for moderate numbers of objectives. We show that exchanging subroutines in OLS is relatively easy and illustrate the importance on a complex planning problem. Finally, we show that it is o en possible to reuse parts of the policies and values, found in earlier iterations of OLS, to hot-start later iterations of OLS. Using this last insight, we propose the rst method for multi-objective POMDPs that employs point-based planning and can produce an ε-CCS in reasonable time.Overall, the methods we propose bring us closer to truly practical multi-objective decisiontheoretic planning.

show abstract

“…Recently, a number of approaches have been developed that transform a Dec-POMDP into a continuous-state MDP and then use techniques from the POMDP literature to solve the continuous-state MDP (Dibangoye, Amato, Doniec, & Charpillet, 2013a;Dibangoye, Amato, Buffet, & Charpillet, 2013b). The state in such a continuous MDP reformulation of a Dec-POMDP, also called occupancy state, is the probability distribution over the world state and the history of observations each agent has received.…”

Section: Related Workmentioning

confidence: 99%

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Kumar

Zilberstein

Toussaint

2015

jair

View full text Add to dashboard Cite

Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models-NEXP-Complete even for two agents-has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.

show abstract

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Cited by 62 publications

References 44 publications

A Concise Introduction to Decentralized POMDPs

A Concise Introduction to Decentralized POMDPs

Multi-objective decision-theoretic planning

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Contact Info

Product

Resources

About