Solving Common-Payoff Games with Approximate Policy Iteration

Sokota, Samuel; Lockhart, Edward; Timbers, Finbarr; Davoodi, Elnaz; D'Orazio, Ryan; Burch, Neil; Schmid, Martin; Bowling, Michael; Lanctot, Marc

doi:10.1609/aaai.v35i11.17166

Cited by 3 publications

(2 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another work used a method inspired by DeepStack applied to The Resistance (13). In the cooperative setting, several works have made use of belief-based learning (and search) using public subgame decomposition (12,61,62), applied to Hanabi (11). Learning and game-theoretic reasoning were also recently combined to produce agents that play well with humans without human data on the collaborative game Overcooked (63).…”

Section: Related Workmentioning

confidence: 99%

Student of Games: A unified learning algorithm for both perfect and imperfect information games

Schmid,

Moravčík,

Burch

et al. 2023

Sci. Adv.

Self Cite

View full text Add to dashboard Cite

Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Student of Games achieves strong empirical performance in large perfect and imperfect information games—an important step toward truly general algorithms for arbitrary environments. We prove that Student of Games is sound, converging to perfect play as available computation and approximation capacity increases. Student of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold’em poker, and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

show abstract

Section: Related Workmentioning

confidence: 99%

Student of Games: A unified learning algorithm for both perfect and imperfect information games

Schmid,

Moravčík,

Burch

et al. 2023

Sci. Adv.

Self Cite

View full text Add to dashboard Cite

show abstract

“…As discussed in the background, provided a reduction from solving common-payoff games to solving belief MDPs; independently, Dibangoye et al (2013b) and Oliehoek (2013) discovered similar reductions. These ideas have been leveraged in a large body of work in decentralized control literature (Lessard & Nayyar, 2013;Nayyar et al, 2014;Arabneydi & Mahajan, 2014;Ouyang et al, 2015;Vasconcelos & Martins, 2016;Tavafoghi et al, 2016;Afshari & Mahajan, 2018;Gagrani & Nayyar, 2018;Tavafoghi et al, 2018;Zhang et al, 2019;Gupta, 2021) and machine learning literature (Dibangoye et al, 2013a;MacDermed & Isbell, 2013;Dibangoye et al, 2014a;b;Dibangoye & Buffet, 2018;Foerster et al, 2019;Sokota et al, 2021;Fickinger et al, 2021;Sokota et al, 2022b;Kao et al, 2022). Use cases include game solving (Dibangoye et al, 2013b), expert iteration (Sokota et al, 2021), and decision-time planning Fickinger et al, 2021;Sokota et al, 2022b).…”

Section: Related Workmentioning

confidence: 99%

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

Sokota¹,

D'Orazio²,

Ling³

et al. 2023

Preprint

View full text Add to dashboard Cite

In their seminal work, showed that imperfect information can be abstracted away from common-payoff games by having players publicly announce their policies as they play. This insight underpins sound solvers and decision-time planning algorithms for common-payoff games. Unfortunately, a naive application of the same insight to two-player zero-sum games fails because Nash equilibria of the game with public policy announcements may not correspond to Nash equilibria of the original game. As a consequence, existing sound decision-time planning algorithms require complicated additional mechanisms that have unappealing properties. The main contribution of this work is showing that certain regularized equilibria do not possess the aforementioned noncorrespondence problem-thus, computing them can be treated as perfect information problems. Because these regularized equilibria can be made arbitrarily close to Nash equilibria, our result opens the door to a new perspective on solving two-player zero-sum games and, in particular, yields a simplified framework for decision-time planning in two-player zero-sum games, void of the unappealing properties that plague existing decision-time planning approaches.

show abstract