2022
DOI: 10.48550/arxiv.2203.05920
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generalized Bandit Regret Minimizer Framework in Imperfect Information Extensive-Form Game

Abstract: Regret minimization methods are a powerful tool for learning approximate Nash equilibrium (NE) in two-player zero-sum imperfect information extensive-form games (IIEGs). We consider the problem in the interactive bandit-feedback setting where we don't know the dynamics of the IIEG. In general, only the interactive trajectory and the reached terminal node value v(z t ) are revealed. To learn NE, the regret minimizer is required to estimate the full-feedback loss gradient t by v(z t ) and minimize the regret. In… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…The authors in [117] recently bridged several standing gaps between NFG and EFG learning by directly transferring desirable properties in NFGs to EFGs, guaranteeing simultaneously last-iterate convergence, lower dependence on the game size, and constant regret in games. Besides, bandit feedback is of practical importance in real-world applications for II-ZSGs [118], [119], where only the interactive trajectory and the payoff of the reached terminal node can be observed without prior knowledge of the game, such as the tree structure, the observation/state space, and transition probabilities (for Markov games) [120]. On the other hand, multi-player II-ZSGs are more challenging and thus have been less researched except for a handful of works, for example, Pluribus [121], the first multi-player poker agent, has defeated top humans in six-player no-limit Texas Hold'em poker (the most prevalent poker in the world) [122], and other endeavors [119], [123]- [126].…”
Section: A Zero-sum Games (Zsgs)mentioning
confidence: 99%
“…The authors in [117] recently bridged several standing gaps between NFG and EFG learning by directly transferring desirable properties in NFGs to EFGs, guaranteeing simultaneously last-iterate convergence, lower dependence on the game size, and constant regret in games. Besides, bandit feedback is of practical importance in real-world applications for II-ZSGs [118], [119], where only the interactive trajectory and the payoff of the reached terminal node can be observed without prior knowledge of the game, such as the tree structure, the observation/state space, and transition probabilities (for Markov games) [120]. On the other hand, multi-player II-ZSGs are more challenging and thus have been less researched except for a handful of works, for example, Pluribus [121], the first multi-player poker agent, has defeated top humans in six-player no-limit Texas Hold'em poker (the most prevalent poker in the world) [122], and other endeavors [119], [123]- [126].…”
Section: A Zero-sum Games (Zsgs)mentioning
confidence: 99%