Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks

Yakovenko, Nikolai; Cao, Liangliang; Raffel, Colin; Fan, James

doi:10.1609/aaai.v30i1.10013

Cited by 12 publications

(8 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For state representation comparison, we consider three alternative methods: 1) Vectorized state representation like DeepCFR (Brown et al 2019) (Vector). It uses vectors to represent the card information (two 52-dimensional vectors) and the action information (each betting position represented by a binary value specifying whether a bet has occurred and a float value specifying the bet size); 2) PokerCNN-based state representation (Yakovenko et al 2016) (PokerCNN) uses 3D tensors to represent card and action information together and use a single ConvNet to learn features; 3) State representation without history information (W/O History Information) is similar to AlphaHoldem except that it does not contain history action information.…”

Section: Ablation Studiesmentioning

confidence: 99%

“…Some recent works also make efforts towards this direction. NFSP (Heinrich and Silver 2016) and Poker-CNN (Yakovenko et al 2016) have approached state-of-the-art performance in limit Texas hold'em. DeepCFR (Brown et al 2019) further improves the performance by approximates CFR's behavior in the game using deep neural networks and Discounted CFR (Brown and Sandholm 2019a).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Zhao

Yan²,

et al. 2022

AAAI

View full text Add to dashboard Cite

Heads-up no-limit Texas hold’em (HUNL) is the quintessential game with imperfect information. Representative priorworks like DeepStack and Libratus heavily rely on counter-factual regret minimization (CFR) and its variants to tackleHUNL. However, the prohibitive computation cost of CFRiteration makes it difficult for subsequent researchers to learnthe CFR model in HUNL and apply it in other practical applications. In this work, we present AlphaHoldem, a high-performance and lightweight HUNL AI obtained with an end-to-end self-play reinforcement learning framework. The proposed framework adopts a pseudo-siamese architecture to directly learn from the input state information to the output actions by competing the learned model with its different historical versions. The main technical contributions include anovel state representation of card and betting information, amultitask self-play training loss function, and a new modelevaluation and selection metric to generate the final model.In a study involving 100,000 hands of poker, AlphaHoldemdefeats Slumbot and DeepStack using only one PC with threedays training. At the same time, AlphaHoldem only takes 2.9milliseconds for each decision-making using only a singleGPU, more than 1,000 times faster than DeepStack. We release the history data among among AlphaHoldem, Slumbot,and top human professionals in the author’s GitHub repository to facilitate further studies in this direction.

show abstract

Section: Ablation Studiesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Zhao

Yan²,

et al. 2022

AAAI

View full text Add to dashboard Cite

show abstract

“…In poker, some previous work exists on directly learning from observations. [12] learned to play simple poker versions from observations of hands, which resulted in a good, but not very strong performance. [13] proposed a self-play algorithm that guarantees to converge to a Nash equilibrium.…”

Section: Related Workmentioning

confidence: 99%

Supervised and Reinforcement Learning from Observations in Reconnaissance Blind Chess

Bertram¹,

Fürnkranz²,

Müller

2022

2022 IEEE Conference on Games (CoG)

View full text Add to dashboard Cite

In this work, we adapt a training approach inspired by the original AlphaGo system to play the imperfect information game of Reconnaissance Blind Chess. Using only the observations instead of a full description of the game state, we first train a supervised agent on publicly available game records. Next, we increase the performance of the agent through self-play with the on-policy reinforcement learning algorithm Proximal Policy Optimization. We do not use any search to avoid problems caused by the partial observability of game states and only use the policy network to generate moves when playing. With this approach, we achieve an ELO of 1330 on the RBC leaderboard, which places our agent at position 27 at the time of this writing. We see that self-play significantly improves performance and that the agent plays acceptably well without search and without making assumptions about the true game state.

show abstract

“…2) Using Reinforcement Learning to solve MDPs with large state-action space: Reinforcement learning (RL) has been used to solve sequential decision-making problems modelled as MDPs. Notable achievements of RL include playing sequential decision-making games like chess [21] and poker [22] at super human level. To the best of our knowledge, only Mori [19] till date has investigated the potential of RL to devise optimal DM policy.…”

Section: Background and Related Workmentioning

confidence: 99%

“…ICAO denes the engine's fuel ows at four different thrust levels: 100%, 85%, 30%, and 7% of engine maximum power corresponding to four different conditions: take-off, climb out, approach and idle, respectively. The thrust level ϵ is specied by the thrust at the time t and the engine's maximum thrust using equation (22).…”

Section: B Design Of Evaluation Experimentsmentioning

confidence: 99%

A Deep Reinforcement Learning Approach for Airport Departure Metering Under Spatial–Temporal Airside Interactions

Ali

Pham

Alam

et al. 2022

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

show abstract

Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks

Cited by 12 publications

References 19 publications

AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Supervised and Reinforcement Learning from Observations in Reconnaissance Blind Chess

A Deep Reinforcement Learning Approach for Airport Departure Metering Under Spatial–Temporal Airside Interactions

Contact Info

Product

Resources

About