Bandit Based Monte-Carlo Planning

Kocsis, Levente; Szepesvári, Csaba

doi:10.1007/11871842_29

Cited by 2,088 publications

(1,966 citation statements)

References 7 publications

Supporting

Mentioning

1,948

Contrasting

Unclassified

Order By: Relevance

“…We assume in this section that the reader is familiar with the Monte-Carlo Tree Search (MCTS) and Upper Confidence Tree (UCT) literature [4,7,9]. We here focus on the experimental application of MCTS to acyclic GSA games.…”

Section: Upper Confidence Trees For Games With Simultaneous Actionsmentioning

confidence: 99%

See 1 more Smart Citation

Upper Confidence Trees with Short Term Partial Information

Teytaud

Flory²

2011

Applications of Evolutionary Computation

View full text Add to dashboard Cite

show abstract

Section: Upper Confidence Trees For Games With Simultaneous Actionsmentioning

confidence: 99%

“…The reader is referred to [7] for more information on UCT; we here focus on the extension of UCT to games with nodes with simultaneous actions, i.e. GSA, in the acyclic case.…”

Section: Algorithmmentioning

confidence: 99%

Upper Confidence Trees with Short Term Partial Information

Teytaud

Flory²

2011

Applications of Evolutionary Computation

View full text Add to dashboard Cite

show abstract

“…Monte-Carlo Tree Search (MCTS [5,7,11]) is a recent tool for difficult planning tasks. Impressive results have already been produced in the case of the game of Go [7,10].…”

Section: Introductionmentioning

confidence: 99%

“…The focus is on the parts of the tree in which the expected gain is the highest. For estimating which situation should be further analyzed, several algorithms have been proposed: UCT [11] (Upper Confidence Trees), focuses on the proportion of winning simulation plus an uncertainty measure; AMAF [4,1,10] (All Moves As First, also termed RAVE for Rapid Action-Value Estimates in the MCTS context), focuses on a compromise between UCT and heuristic information extracted from the simulations; BAST [6] (Bandit Algorithm for Search in Tree), uses UCB-like bounds modified through the overall number of nodes in the tree. Other related algorithms have been proposed as in [5], essentially using a decreasing impact of a heuristic (pattern-dependent) bias as the number of simulations increases.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search

Chaslot

Fiter

Hoock

et al. 2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We present a new exploration term, more efficient than classical UCT-like exploration terms and combining efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values and classical online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in the recent progress of our program MoGo:-We show an expert-based improvement of Monte-Carlo simulations for nakade situations; we also emphasize some limitations of this modification. -We show a technique which preserves diversity in the Monte-Carlo simulation, which greatly improves the results in 19x19. -Whereas the UCB-based exploration term is not efficient in MoGo, we show a new exploration term which is highly efficient in MoGo. MoGo recently won a game with handicap 7 against a 9Dan Pro player, Zhou JunXun, winner of the LG Cup 2007, and a game with handicap 6 against a 1Dan pro player, Li-Chen Chien. 1

show abstract

Reinforcement Learning Algorithms forMDPs

Szepesvári

2011

Wiley Encyclopedia of Operations Research and Management Science

Self Cite

View full text Add to dashboard Cite

Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. In this article we focus on a few selected algorithms of reinforcement learning which build on the powerful theory of dynamic programming.

show abstract

Bandit Based Monte-Carlo Planning

Cited by 2,088 publications

References 7 publications

Upper Confidence Trees with Short Term Partial Information

Upper Confidence Trees with Short Term Partial Information

Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search

Reinforcement Learning Algorithms forMDPs

Contact Info

Product

Resources

About