Single Deep Counterfactual Regret Minimization

Steinberger, Eric

doi:10.48550/arxiv.1901.07621

Cited by 12 publications

(27 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we conduct extensive experiments to evaluate the proposed L2E framework. We evaluate algorithm performance on the Leduc poker, the BigLeduc poker, and a Grid Soccer environment, the commonly used benchmark for opponent modeling (Lanctot et al, 2017;Steinberger, 2019;He et al, 2016). We first verify that the trained base policy using our L2E framework quickly exploit a wide range of opponents with only a few gradient updates.…”

Section: Methodsmentioning

confidence: 99%

“…Furthermore, how to generate diverse strategies has been preliminarily studied in the reinforcement learning community. In specific, diverse strategies can be obtained in various ways, including adding some diversity regularization to the optimization objective (Abdullah et al, 2019), randomly searching in some diverse parameter space (Plappert et al, 2018;Fortunato et al, 2018), using information-based strategy proposal (Eysenbach et al, 2018;Gupta et al, 2018), and searching diverse strategies with evolutionary algorithms (Agapitos et al, 2008;Wang et al, 2019;Jaderberg et al, 2017;2019). More recently, researchers from DeepMind propose a league training paradigm to obtain a Grandmaster level StarCraft II AI (i.e., AlphaStar) by training a diverse league of continually adapting strategies and counter-strategies (Vinyals et al, 2019).…”

Section: Strategy Generationmentioning

confidence: 99%

See 1 more Smart Citation

L2E: Learning to Exploit Your Opponent

et al. 2021

Preprint

View full text Add to dashboard Cite

Opponent modeling is essential to exploit suboptimal opponents in strategic interactions. Most previous works focus on building explicit models to directly predict the opponents' styles or strategies, which require a large amount of data to train the model and lack adaptability to unknown opponents. In this work, we propose a novel Learning to Exploit (L2E) framework for implicit opponent modeling. L2E acquires the ability to exploit opponents by a few interactions with different opponents during training, thus can adapt to new opponents with unknown styles during testing quickly. We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically. We evaluate L2E on two poker games and one grid soccer game, which are the commonly used benchmarks for opponent modeling. Comprehensive experimental results indicate that L2E quickly adapts to diverse styles of unknown opponents.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Strategy Generationmentioning

confidence: 99%

L2E: Learning to Exploit Your Opponent

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Double neural CFR [8] and Deep CFR [9] combine the deep neural network with the vanilla CFR and the linear CFR (LCFR) respectively. In addition, single deep CFR (SDCFR) [10] is a simplified variant of the Deep CFR, which only uses one neural network to approximate the value in the LCFR. Moreover, public chance sampling in CFR (PCCFR) [11], variance reduction in MCCFR (VR-MCCFR) [12] and discount CFR (DCFR) [13] are all variants of the vanilla CFR.…”

Section: Introductionmentioning

confidence: 99%

Solving imperfect-information games via exponential counterfactual regret minimization

Li,

Wang,

et al. 2020

Preprint

View full text Add to dashboard Cite

Two agents' decision-making problems can be modeled as the game with two players, and a Nash equilibrium is the basic solution conception representing good play in games. Counterfactual regret minimization (CFR) is a popular method to solve Nash equilibrium strategy in two-player zero-sum games with imperfect information. The CFR and its variants have successfully addressed many problems in this field. However, the convergence of the CFR methods is not fast, since they solve the strategy by iterative computing. To some extent, this further affects the solution performance. In this paper, we propose a novel CFR based method, exponential counterfactual regret minimization, which also can be called as ECFR. Firstly, we present an exponential reduction technique for regret in the process of the iteration. Secondly, we prove that our method ECFR has a good theoretical guarantee of convergence. Finally, we show that, ECFR can converge faster compared with the prior state-of-the-art CFR based methods in the experiment.

show abstract

“…CFR has achieved great success in the IIG, and has many improvement methods over the years. [17,18,19,20,21,22]. However, there is still a problem needs to be solved: how to improve the generalization of the CFR based methods.…”

Section: Introductionmentioning

confidence: 99%

RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

Li¹,

Wang²,

Jia³

et al. 2020

Preprint

View full text Add to dashboard Cite

Counterfactual regret minimization (CFR) is a popular method to deal with decision-making problems of two-player zero-sum games with imperfect information. Unlike existing studies that mostly explore for solving larger scale problems or accelerating solution efficiency, we propose a framework, RLCFR, which aims at improving the generalization ability of the CFR method. In the RLCFR, the game strategy is solved by the CFR in a reinforcement learning framework. And the dynamic procedure of iterative interactive strategy updating is modeled as a Markov decision process (MDP). Our method, RLCFR, then learns a policy to select the appropriate way of regret updating in the process of iteration. In addition, a stepwise reward function is formulated to learn the action policy, which is proportional to how well the iteration strategy is at each step. Extensive experimental results on various games have shown that the generalization ability of our method is significantly improved compared with existing state-of-the-art methods.

show abstract

Single Deep Counterfactual Regret Minimization

Cited by 12 publications

References 11 publications

L2E: Learning to Exploit Your Opponent

L2E: Learning to Exploit Your Opponent

Solving imperfect-information games via exponential counterfactual regret minimization

RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

Contact Info

Product

Resources

About