Continuous Upper Confidence Trees

Couëtoux, Adrien; Hoock, Jean‐Baptiste; Sokolovska, Nataliya; Teytaud, Olivier; Bonnard, Nicolas

doi:10.1007/978-3-642-25566-3_32

Cited by 131 publications

(107 citation statements)

References 8 publications

Supporting

Mentioning

107

Contrasting

Order By: Relevance

“…We have shown the consistency of our modified version, with polynomial exploration and double progressive widening, for a more general case. [6] have shown that the classical UCT is not consistent in this case and already proposed double progressive widening; we here give a proof of the consistency of this approach, when we use polynomial exploration; [6] was using logarithmic exploration.…”

Section: Experimental Validationmentioning

confidence: 76%

See 1 more Smart Citation

Continuous Upper Confidence Trees with Polynomial Exploration – Consistency

Auger

Cou«toux

Teytaud

2013

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Section: Experimental Validationmentioning

confidence: 76%

“…For a random node z, we actually have the same property, depending on the double progressive widening constant α R d : this is the so-called double progressive widening trick ( [6]; see also [9]). …”

Section: Puct Algorithmmentioning

confidence: 99%

Continuous Upper Confidence Trees with Polynomial Exploration – Consistency

Auger

Cou«toux

Teytaud

2013

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

“…What follows, is the formal description of the state of the art continuous MCTS, i.e. MCTS with Double Progressive Widening (MCTS-DPW), as seen in [3]. As the reader can see, it mainly requires two things: (i) a transition function, capable of simulating what happens when an action a is taken in state s, and returns a new state s ′ and a reward r. (ii) a default policy ϕ, capable of returning an action a, given a state s. When nothing is specified, it is assumed that this function returns a random action, following a random distribution that covers the entire set of feasible actions in state s. Let nbV isits(s) ← nbV isits(s) + 1 and let t = nbV isits(s) Let k = ⌈Ct α ⌉.…”

Section: Monte-carlo Tree Search and Upper Confidence Treesmentioning

confidence: 99%

Learning a Move-Generator for Upper Confidence Trees

Couëtoux

Teytaud

Doghmen

2013

Advances in Intelligent Systems and Applications - Volume 1

Self Cite

View full text Add to dashboard Cite

Abstract. We experiment the introduction of machine learning tools to improve Monte-Carlo Tree Search. More precisely, we propose the use of Direct Policy Search, a classical reinforcement learning paradigm, to learn the Monte-Carlo Move Generator. We experiment our algorithm on different forms of unit commitment problems, including experiments on a problem with both macrolevel and microlevel decisions.

show abstract

“…n(s) is the number of simulations including state s, and n(s, a) is the number of simulations including action a in state s; they are all initialized to 0. Indeed, the implementation that we use (from the Mash project) is slightly more sophisticated and uses progressive widening [10], [11]; this is not relevant for this paper.…”

Section: A the Rigorous Approach: Monte-carlo Tree Search With Rejecmentioning

confidence: 99%

Consistent Belief State Estimation, with Application to Mines

Couëtoux

Milone²,

Teytaud³

2011

2011 International Conference on Technologies and Applications of Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

Abstract-Estimating the belief state is the main issue in games with Partial Observation. It is commonly done by heuristic methods, with no mathematical guarantee. We here focus on mathematically consistent belief state estimation methods, in the case of one-player games. We clearly separate the search algorithm (which might be e.g. alpha-beta or Monte-Carlo Tree Search) and the belief state estimation. We basically propose rejection methods and simple Monte-Carlo Markov Chain methods, with a time budget proportional to the time spent by the search algorithm on the situation at which the belief state is to be estimated; this is conveniently approximated by the number of simulations in the current node. While the approach is intended to be generic, we perform experiments on the wellknown Mines game, available on most Windows and Linux distributions. Interestingly, it detects non-trivial facts, e.g. the fact that the probability of winning the game is not the same for different moves, even those with the same probability of immediate death. The rejection method, which is slow but has no parameter and which is consistent in a non-asymptotic setting, performed better than the MCMC method in spite of tuning efforts.

show abstract

Continuous Upper Confidence Trees

Cited by 131 publications

References 8 publications

Continuous Upper Confidence Trees with Polynomial Exploration – Consistency

Continuous Upper Confidence Trees with Polynomial Exploration – Consistency

Learning a Move-Generator for Upper Confidence Trees

Consistent Belief State Estimation, with Application to Mines

Contact Info

Product

Resources

About