2011
DOI: 10.1007/978-3-642-25566-3_32
|View full text |Cite
|
Sign up to set email alerts
|

Continuous Upper Confidence Trees

Abstract: Abstract. Upper Confidence Trees are a very efficient tool for solving Markov Decision Processes; originating in difficult games like the game of Go, it is in particular surprisingly efficient in high dimensional problems. It is known that it can be adapted to continuous domains in some cases (in particular continuous action spaces). We here present an extension of Upper Confidence Trees to continuous stochastic problems. We (i) show a deceptive problem on which the classical Upper Confidence Tree approach doe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
107
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
7
1
1

Relationship

4
5

Authors

Journals

citations
Cited by 131 publications
(107 citation statements)
references
References 8 publications
0
107
0
Order By: Relevance
“…We have shown the consistency of our modified version, with polynomial exploration and double progressive widening, for a more general case. [6] have shown that the classical UCT is not consistent in this case and already proposed double progressive widening; we here give a proof of the consistency of this approach, when we use polynomial exploration; [6] was using logarithmic exploration.…”
Section: Experimental Validationmentioning
confidence: 76%
See 1 more Smart Citation
“…We have shown the consistency of our modified version, with polynomial exploration and double progressive widening, for a more general case. [6] have shown that the classical UCT is not consistent in this case and already proposed double progressive widening; we here give a proof of the consistency of this approach, when we use polynomial exploration; [6] was using logarithmic exploration.…”
Section: Experimental Validationmentioning
confidence: 76%
“…For a random node z, we actually have the same property, depending on the double progressive widening constant α R d : this is the so-called double progressive widening trick ( [6]; see also [9]). …”
Section: Puct Algorithmmentioning
confidence: 99%
“…What follows, is the formal description of the state of the art continuous MCTS, i.e. MCTS with Double Progressive Widening (MCTS-DPW), as seen in [3]. As the reader can see, it mainly requires two things: (i) a transition function, capable of simulating what happens when an action a is taken in state s, and returns a new state s ′ and a reward r. (ii) a default policy ϕ, capable of returning an action a, given a state s. When nothing is specified, it is assumed that this function returns a random action, following a random distribution that covers the entire set of feasible actions in state s. Let nbV isits(s) ← nbV isits(s) + 1 and let t = nbV isits(s) Let k = ⌈Ct α ⌉.…”
Section: Monte-carlo Tree Search and Upper Confidence Treesmentioning
confidence: 99%
“…n(s) is the number of simulations including state s, and n(s, a) is the number of simulations including action a in state s; they are all initialized to 0. Indeed, the implementation that we use (from the Mash project) is slightly more sophisticated and uses progressive widening [10], [11]; this is not relevant for this paper.…”
Section: A the Rigorous Approach: Monte-carlo Tree Search With Rejecmentioning
confidence: 99%