2006
DOI: 10.1007/11871842_29
|View full text |Cite
|
Sign up to set email alerts
|

Bandit Based Monte-Carlo Planning

Abstract: Abstract. For large state-space Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternativ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
1,948
1
14

Year Published

2010
2010
2019
2019

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 2,088 publications
(1,966 citation statements)
references
References 7 publications
3
1,948
1
14
Order By: Relevance
“…We assume in this section that the reader is familiar with the Monte-Carlo Tree Search (MCTS) and Upper Confidence Tree (UCT) literature [4,7,9]. We here focus on the experimental application of MCTS to acyclic GSA games.…”
Section: Upper Confidence Trees For Games With Simultaneous Actionsmentioning
confidence: 99%
See 1 more Smart Citation
“…We assume in this section that the reader is familiar with the Monte-Carlo Tree Search (MCTS) and Upper Confidence Tree (UCT) literature [4,7,9]. We here focus on the experimental application of MCTS to acyclic GSA games.…”
Section: Upper Confidence Trees For Games With Simultaneous Actionsmentioning
confidence: 99%
“…The reader is referred to [7] for more information on UCT; we here focus on the extension of UCT to games with nodes with simultaneous actions, i.e. GSA, in the acyclic case.…”
Section: Algorithmmentioning
confidence: 99%
“…Monte-Carlo Tree Search (MCTS [5,7,11]) is a recent tool for difficult planning tasks. Impressive results have already been produced in the case of the game of Go [7,10].…”
Section: Introductionmentioning
confidence: 99%
“…The focus is on the parts of the tree in which the expected gain is the highest. For estimating which situation should be further analyzed, several algorithms have been proposed: UCT [11] (Upper Confidence Trees), focuses on the proportion of winning simulation plus an uncertainty measure; AMAF [4,1,10] (All Moves As First, also termed RAVE for Rapid Action-Value Estimates in the MCTS context), focuses on a compromise between UCT and heuristic information extracted from the simulations; BAST [6] (Bandit Algorithm for Search in Tree), uses UCB-like bounds modified through the overall number of nodes in the tree. Other related algorithms have been proposed as in [5], essentially using a decreasing impact of a heuristic (pattern-dependent) bias as the number of simulations increases.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation