Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

Soemers, Dennis J. N. J.; Piette, Éric; Stephenson, Matthew; Browne, Cameron

doi:10.1109/cig.2019.8848037

Cited by 6 publications

(11 citation statements)

References 25 publications

(51 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Consequently, a computationally heavy process is run just once (offline) and then this time-efficient problem representation can be used in subsequent online applications. The approach of combining an MCTS trainer with a fast learning-based representation can be hybridised in various ways, specific to particular problem / domain of interest (Guo et al, 2014;Kartal et al, 2019a;Soemers et al, 2019).…”

Section: Discussionmentioning

confidence: 99%

“…-where P (m i ) is the output from the neural network trained on human data; C BT is a weight of how the bias blends with the UCT score and K is a parameter controlling the rate at which the bias decreases. Soemers et al (2019) show that is possible to learn a policy in a MPD using the policy gradient method and value estimates directly from the MCTS algorithm. Kartal et al (2019a) propose a method to combine deep reinforcement learning and MCTS, where the latter acts as a demonstrator for the RL component.…”

Section: Mimicking Human Playmentioning

confidence: 99%

See 1 more Smart Citation

Monte Carlo Tree Search: A Review of Recent Modifications and Applications

Świechowski,

Godlewski,

Sawicki

et al. 2021

Preprint

View full text Add to dashboard Cite

Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems. The method relies on intelligent tree search that balances exploration and exploitation. MCTS performs random sampling in the form of simulations and stores statistics of actions to make more educated choices in each subsequent iteration. The method has become a state-ofthe-art technique for combinatorial games, however, in more complex games (e.g. those with high branching factor or real-time ones), as well as in various practical domains (e.g. transportation, scheduling or security) an efficient MCTS application often requires its problemdependent modification or integration with other techniques. Such domain-specific modifications and hybrid approaches are the main focus of this survey. The last major MCTS survey has been published in 2012. Contributions that appeared since its release are of particular interest for this review.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Mimicking Human Playmentioning

confidence: 99%

Monte Carlo Tree Search: A Review of Recent Modifications and Applications

Świechowski,

Godlewski,

Sawicki

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It is based on MCTS in which the simulation phase is replaced by a deep RL model that acts in the environment and chooses actions according to its policy rather than randomly. Soemers et al (2019) show that is possible to learn a policy in a MPD using the policy gradient method and value estimates directly from the MCTS algorithm. propose a method to combine deep RL and MCTS, where the latter acts as a demonstrator for the RL component.…”

Section: Alphago Inspired Approachesmentioning

confidence: 99%

Monte Carlo Tree Search: a review of recent modifications and applications

Świechowski¹,

et al. 2022

View full text Add to dashboard Cite

Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems. The method relies on intelligent tree search that balances exploration and exploitation. MCTS performs random sampling in the form of simulations and stores statistics of actions to make more educated choices in each subsequent iteration. The method has become a state-of-the-art technique for combinatorial games. However, in more complex games (e.g. those with a high branching factor or real-time ones) as well as in various practical domains (e.g. transportation, scheduling or security) an efficient MCTS application often requires its problem-dependent modification or integration with other techniques. Such domain-specific modifications and hybrid approaches are the main focus of this survey. The last major MCTS survey was published in 2012. Contributions that appeared since its release are of particular interest for this review.

show abstract

“…Therefore, the first 10 moves will contain MCTS's exploration, and the rest will feature only the most-visited action. We note that there exists research on this area, with a focus on removing exploration elements from MCTS policy targets with the hope of aiding interpretability [32].…”

Section: Hyperparameters and General Performance Improvementsmentioning

confidence: 99%

BRExIt: On Opponent Modelling in Expert Iteration

Hernández¹,

Baier²,

Kaisers³

2022

Preprint

View full text Add to dashboard Cite

Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against candidate opponents (typically previously learnt policies). We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt). BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response. In an empirical ablation on BRExIt's algorithmic variants in the game Connect4 against a set of fixed test agents, we provide statistical evidence that BRExIt learns well-performing policies with greater sample efficiency than ExIt.

show abstract

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

Cited by 6 publications

References 25 publications

Monte Carlo Tree Search: A Review of Recent Modifications and Applications

Monte Carlo Tree Search: A Review of Recent Modifications and Applications

Monte Carlo Tree Search: a review of recent modifications and applications

BRExIt: On Opponent Modelling in Expert Iteration

Contact Info

Product

Resources

About