2016
DOI: 10.1016/j.tcs.2016.06.029
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive playouts for online learning of policies during Monte Carlo Tree Search

Abstract: Monte-Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree search has difficulties to find the correct move due to the large search space. This paper explores adaptive playout policies which improve the playout policy during a tree search. With the help of policy gradient reinforcement learning techniques we optimize the playout policy to give better evaluations. We tested the algorithm in Computer Go and m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…The success of MCTS applications to games with perfect information, such as Chess and Go, motivated the researchers to apply it to games with more complicated rules such as card games, real-time strategy (RTS) and other Algorithm 1: MCTS with adaptive playouts (Graf and Platzner, 2016)…”
Section: Games With Imperfect Informationmentioning
confidence: 99%
“…The success of MCTS applications to games with perfect information, such as Chess and Go, motivated the researchers to apply it to games with more complicated rules such as card games, real-time strategy (RTS) and other Algorithm 1: MCTS with adaptive playouts (Graf and Platzner, 2016)…”
Section: Games With Imperfect Informationmentioning
confidence: 99%
“…PPA is therefore closely related to reinforcement learning whereas MAST is about statistics on moves. Adaptive sampling techniques related to PPA have also been tried recently for Go with success [23,24].…”
Section: Introductionmentioning
confidence: 99%