2022
DOI: 10.1126/science.add4679
|View full text |Cite
|
Sign up to set email alerts
|

Mastering the game of Stratego with model-free multiagent reinforcement learning

Abstract: We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
40
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 76 publications
(40 citation statements)
references
References 27 publications
0
40
0
Order By: Relevance
“…However, when applied to CMDPs, minimizing the squared gradient with respect to the Lagrange multiplier(s) is equivalent to an apprenticeship learning problem (Abbeel & Ng, 2004;Zahavy et al, 2020a;Shani et al, 2022), which is itself a convex MDP representing a challenging optimization problem (Zahavy et al, 2021b). Perolat et al (2021) instead augment the objective with an adaptive regularizer, solving the resulting convex/concave (but biased) problem exactly before iteratively refitting with progressively lesser regularization.…”
Section: A Additional Related Workmentioning
confidence: 99%
“…However, when applied to CMDPs, minimizing the squared gradient with respect to the Lagrange multiplier(s) is equivalent to an apprenticeship learning problem (Abbeel & Ng, 2004;Zahavy et al, 2020a;Shani et al, 2022), which is itself a convex MDP representing a challenging optimization problem (Zahavy et al, 2021b). Perolat et al (2021) instead augment the objective with an adaptive regularizer, solving the resulting convex/concave (but biased) problem exactly before iteratively refitting with progressively lesser regularization.…”
Section: A Additional Related Workmentioning
confidence: 99%
“…MiniMaxKL Objectives in Two-Player Zero-Sum Games A number of recent prior works have made use of MiniMaxEnt and MiniMaxKL objectives for the purpose of inducing last iterate convergence (Perolat et al, 2021;Cen et al, 2021;Zeng et al, 2022;Sokota et al, 2022a;Perolat et al, 2022). While we also make use of these objectives, our use case (eliminating the noncorrespondence problem) differs substantially.…”
Section: Related Workmentioning
confidence: 99%
“…A real game system can involve a large number of strategies, most of which would be dominated during the process of finding Nash equilibrium [3,22]. As illustrated in Figure 1a, the full equilibrium finding process can be classified into three stages:…”
Section: Collapsementioning
confidence: 99%
“…This study has real life applications in fields including artificial intelligence [22] and and the social systems addressed in [9,25]. the collapse is a legitimate constituent part of the process of game evolution.…”
Section: Related Workmentioning
confidence: 99%