2019
DOI: 10.48550/arxiv.1901.07621
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Single Deep Counterfactual Regret Minimization

Abstract: Counterfactual Regret Minimization (CFR) is the most successful algorithm for finding approximate Nash equilibria in imperfect information games. However, CFR's reliance on full game-tree traversals limits its scalability and generality. Therefore, the game's state-and action-space is often abstracted (i.e. simplified) for CFR, and the resulting strategy is then translated back to the full game. This requires extensive expert-knowledge, is not possible in many games outside of poker, and often converges to hig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(27 citation statements)
references
References 11 publications
0
21
0
Order By: Relevance
“…In this section, we conduct extensive experiments to evaluate the proposed L2E framework. We evaluate algorithm performance on the Leduc poker, the BigLeduc poker, and a Grid Soccer environment, the commonly used benchmark for opponent modeling (Lanctot et al, 2017;Steinberger, 2019;He et al, 2016). We first verify that the trained base policy using our L2E framework quickly exploit a wide range of opponents with only a few gradient updates.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we conduct extensive experiments to evaluate the proposed L2E framework. We evaluate algorithm performance on the Leduc poker, the BigLeduc poker, and a Grid Soccer environment, the commonly used benchmark for opponent modeling (Lanctot et al, 2017;Steinberger, 2019;He et al, 2016). We first verify that the trained base policy using our L2E framework quickly exploit a wide range of opponents with only a few gradient updates.…”
Section: Methodsmentioning
confidence: 99%
“…Furthermore, how to generate diverse strategies has been preliminarily studied in the reinforcement learning community. In specific, diverse strategies can be obtained in various ways, including adding some diversity regularization to the optimization objective (Abdullah et al, 2019), randomly searching in some diverse parameter space (Plappert et al, 2018;Fortunato et al, 2018), using information-based strategy proposal (Eysenbach et al, 2018;Gupta et al, 2018), and searching diverse strategies with evolutionary algorithms (Agapitos et al, 2008;Wang et al, 2019;Jaderberg et al, 2017;2019). More recently, researchers from DeepMind propose a league training paradigm to obtain a Grandmaster level StarCraft II AI (i.e., AlphaStar) by training a diverse league of continually adapting strategies and counter-strategies (Vinyals et al, 2019).…”
Section: Strategy Generationmentioning
confidence: 99%
“…Double neural CFR [8] and Deep CFR [9] combine the deep neural network with the vanilla CFR and the linear CFR (LCFR) respectively. In addition, single deep CFR (SDCFR) [10] is a simplified variant of the Deep CFR, which only uses one neural network to approximate the value in the LCFR. Moreover, public chance sampling in CFR (PCCFR) [11], variance reduction in MCCFR (VR-MCCFR) [12] and discount CFR (DCFR) [13] are all variants of the vanilla CFR.…”
Section: Introductionmentioning
confidence: 99%
“…CFR has achieved great success in the IIG, and has many improvement methods over the years. [17,18,19,20,21,22]. However, there is still a problem needs to be solved: how to improve the generalization of the CFR based methods.…”
Section: Introductionmentioning
confidence: 99%