2021
DOI: 10.1609/aaai.v35i11.17166
|View full text |Cite
|
Sign up to set email alerts
|

Solving Common-Payoff Games with Approximate Policy Iteration

Abstract: For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult---computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight---that a team of agents can coordinate via common knowledge---has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian actio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 44 publications
0
2
0
Order By: Relevance
“…Another work used a method inspired by DeepStack applied to The Resistance (13). In the cooperative setting, several works have made use of belief-based learning (and search) using public subgame decomposition (12,61,62), applied to Hanabi (11). Learning and game-theoretic reasoning were also recently combined to produce agents that play well with humans without human data on the collaborative game Overcooked (63).…”
Section: Related Workmentioning
confidence: 99%
“…Another work used a method inspired by DeepStack applied to The Resistance (13). In the cooperative setting, several works have made use of belief-based learning (and search) using public subgame decomposition (12,61,62), applied to Hanabi (11). Learning and game-theoretic reasoning were also recently combined to produce agents that play well with humans without human data on the collaborative game Overcooked (63).…”
Section: Related Workmentioning
confidence: 99%
“…As discussed in the background, provided a reduction from solving common-payoff games to solving belief MDPs; independently, Dibangoye et al (2013b) and Oliehoek (2013) discovered similar reductions. These ideas have been leveraged in a large body of work in decentralized control literature (Lessard & Nayyar, 2013;Nayyar et al, 2014;Arabneydi & Mahajan, 2014;Ouyang et al, 2015;Vasconcelos & Martins, 2016;Tavafoghi et al, 2016;Afshari & Mahajan, 2018;Gagrani & Nayyar, 2018;Tavafoghi et al, 2018;Zhang et al, 2019;Gupta, 2021) and machine learning literature (Dibangoye et al, 2013a;MacDermed & Isbell, 2013;Dibangoye et al, 2014a;b;Dibangoye & Buffet, 2018;Foerster et al, 2019;Sokota et al, 2021;Fickinger et al, 2021;Sokota et al, 2022b;Kao et al, 2022). Use cases include game solving (Dibangoye et al, 2013b), expert iteration (Sokota et al, 2021), and decision-time planning Fickinger et al, 2021;Sokota et al, 2022b).…”
Section: Related Workmentioning
confidence: 99%