2008
DOI: 10.1613/jair.2447
|View full text |Cite
|
Sign up to set email alerts
|

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Abstract: Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
200
1

Year Published

2010
2010
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 275 publications
(203 citation statements)
references
References 64 publications
2
200
1
Order By: Relevance
“…Each individual agent considers itself part of a hypothetical centralized "team-agent" that has joint control of all the agents that are included in the team and optimizes the joint reward of that team (Sugden 2003; Bratman 2014). Planning under this approach combines all the agents in a game into a single agent and finds a joint plan which optimizes that group objective (De Cote and Littman 2008;Oliehoek, Spaan, and Vlassis 2008;Kleiman-Weiner et al 2016). If J is the set of agents that have joined together as a team, their joint-plan can be characterized as:…”
Section: Operator Composition (Replace)mentioning
confidence: 99%
“…Each individual agent considers itself part of a hypothetical centralized "team-agent" that has joint control of all the agents that are included in the team and optimizes the joint reward of that team (Sugden 2003; Bratman 2014). Planning under this approach combines all the agents in a game into a single agent and finds a joint plan which optimizes that group objective (De Cote and Littman 2008;Oliehoek, Spaan, and Vlassis 2008;Kleiman-Weiner et al 2016). If J is the set of agents that have joined together as a team, their joint-plan can be characterized as:…”
Section: Operator Composition (Replace)mentioning
confidence: 99%
“…Emery-Montemerlo et al (2004) approximated a cooperative POSG by a series of Bayesian games. Another cooperative POSG, called a decentralized partially observable Markov decision process (DEC-POMDP), has also been extensively studied by Becker et al (2004), Bernstein et al (2005), Seuken and Zilberstein (2007), and Oliehoek et al (2008). A survey of the DEC-POMDP can be found in Oliehoek (2012).…”
Section: The Partially Observable Stochastic Gamementioning
confidence: 99%
“…When communication delays are limited and potentially stochastic, the problem can be modeled as a multiagent POMDP with delayed communication [33,54]. Finally, when no communication channel is present, the problem can be modeled as a decentralized POMDP [6,34]. Extending the POMDP-IR framework to any of these multiagent models is promising.…”
Section: Future Workmentioning
confidence: 99%