2020
DOI: 10.48550/arxiv.2008.12234
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Advantage Regret-Matching Actor-Critic

Abstract: Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the Advantage Regret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC saves a buffer of past policies, replaying through them to reconstruct hindsight assessments of past behavior. The… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…MARL Algorithms in Zero-Sum Games MARL methods have been applied to zero-sum games tracing back to the TD-Gammon project (Tesauro 1995). A large body of work (Zinkevich et al 2007;Brown et al 2019;Steinberger, Lerer, and Brown 2020;Gruslys et al 2020) is based on regret minimization, and a well-known result is that the average of policies produced by self-play of regret-minimizing algorithms converges to the NE policy of zero-sum games (Freund and Schapire 1996). Another notable line of work (Littman 1994;Heinrich, Lanctot, and Silver 2015;Lanctot et al 2017;Perolat et al 2022) combines RL algorithms with game-theoretic approaches.…”
Section: Preliminary Markov Gamementioning
confidence: 99%
“…MARL Algorithms in Zero-Sum Games MARL methods have been applied to zero-sum games tracing back to the TD-Gammon project (Tesauro 1995). A large body of work (Zinkevich et al 2007;Brown et al 2019;Steinberger, Lerer, and Brown 2020;Gruslys et al 2020) is based on regret minimization, and a well-known result is that the average of policies produced by self-play of regret-minimizing algorithms converges to the NE policy of zero-sum games (Freund and Schapire 1996). Another notable line of work (Littman 1994;Heinrich, Lanctot, and Silver 2015;Lanctot et al 2017;Perolat et al 2022) combines RL algorithms with game-theoretic approaches.…”
Section: Preliminary Markov Gamementioning
confidence: 99%
“…Therefore, many neural variants of CFR have been proposed. They approximate the behaviour of CFR via neural networks to scale to large-scale games Li et al 2019;Steinberger 2019;Gruslys et al 2020;Hennes et al 2020;Fu et al 2021;McAleer et al 2022). At each iteration, these methods estimate the counterfactual regrets and update the strategy using the estimated counterfactual regrets.…”
Section: Related Workmentioning
confidence: 99%
“…Due to the large-scale state space in most real-world scenarios, it is impossible to traverse the entire game tree and use tables to represent strategies. To sidestep the issue, many neural variants of CFR have been proposed Li et al 2019;Gruslys et al 2020;Hennes et al 2020;Steinberger, Lerer, and Brown 2020;Fu et al 2021;McAleer et al 2022). At each time, they estimate the counterfactual regrets and update the strategy using the estimated regrets.…”
Section: Introductionmentioning
confidence: 99%
“…However, Deep CFR uses external sampling, which may be impractical for games with a large branching factor such as Stratego and Barrage Stratego. DREAM (Steinberger et al, 2020) and ARMAC (Gruslys et al, 2020) are model-free regret-based deep learning approaches.…”
Section: Related Workmentioning
confidence: 99%