Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm

Rolet, Philippe; Sebag, Michèle; Teytaud, Olivier

doi:10.1007/978-3-642-04174-7_20

Cited by 15 publications

(12 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Usually these schedules restrict the number of children to grow as the logarithm or some root of the number of simulations -examples of each case can be found in [12] and [27] respectively. Unlike FPU, this approach could be very poor if legal actions for expansion are selected entirely randomly: even if initial actions look poor, the schedule prevents further exploration.…”

Section: Tree Policies For Large Branching Factorsmentioning

confidence: 99%

“…Unlike FPU, this approach could be very poor if legal actions for expansion are selected entirely randomly: even if initial actions look poor, the schedule prevents further exploration. For this reason, progressive widening orders the legal actions based on some quality heuristic [27] (such as an evaluation function), and expands them in decreasing order of the heuristic.…”

Section: Tree Policies For Large Branching Factorsmentioning

confidence: 99%

See 1 more Smart Citation

Sample Evaluation for Action Selection in Monte Carlo Tree Search

Brand

Kroon

2014

Proceedings of the Southern African Institute for Computer Scientist and Information Technologists Annual Conference 2014 on SA

View full text Add to dashboard Cite

Building sophisticated computer players for games has been of interest since the advent of artificial intelligence research. Monte Carlo tree search (MCTS) techniques have led to recent advances in the performance of computer players in a variety of games. Without any refinements, the commonlyused upper confidence bounds applied to trees (UCT) selection policy for MCTS performs poorly on games with high branching factors, because an inordinate amount of time is spent performing simulations from each sibling of a node before that node can be further investigated. Move-ordering heuristics are usually proposed to address this issue, but when the branching factor is large, it can be costly to order candidate actions. We propose a technique combining sampling from the action space with a naïve evaluation function for identifying nodes to add to the tree when using MCTS in cases where the branching factor is large. The approach is evaluated on a restricted version of the board game Risk with promising results.

show abstract

Section: Tree Policies For Large Branching Factorsmentioning

confidence: 99%

Section: Tree Policies For Large Branching Factorsmentioning

confidence: 99%

Sample Evaluation for Action Selection in Monte Carlo Tree Search

Brand

Kroon

2014

Proceedings of the Southern African Institute for Computer Scientist and Information Technologists Annual Conference 2014 on SA

View full text Add to dashboard Cite

show abstract

“…Indeed, MCTS has been recently used by Gaudel and Sebag (2010) in their FUSE (Feature Uct SElection) system to perform feature selection, and by Rolet et al (2009) in BAAL (Banditbased Active Learner) for active learning with small training sets. Gaudel and Sebag (2010) firstly formalize feature selection as a Reinforcement Learning (RL) problem and then provide an approximation of the optimal policy by casting the RL problem as a one-player game whose states are all possible subsets of features and whose actions consist of choosing a feature and adding it to a subset.…”

Section: Related Workmentioning

confidence: 99%

“…The problem is then solved with the UCT approach leading to the FUSE algorithm. Rolet et al (2009) focus on Active Learning (AL) with a limited number of queries. The authors formalized AL under bounded resources as a finite horizon RL problem.…”

Section: Related Workmentioning

confidence: 99%

“…In particular, we propose to search the space of possible theories using a Monte Carlo Tree Search (MCTS) algorithm (Browne et al 2012). MCTS has been originally and extensively applied to Computer Go and recently used in Machine Learning in FUSE (Feature UCT Selection) (Gaudel and Sebag 2010), that performs feature selection, and BAAL (Bandit-based Active Learner) (Rolet et al 2009), that focuses on active learning with small training sets. In this paper, similarly to FUSE, we propose the system LEMUR (LEarning with a Monte carlo Upgrade of tRee search) relying on UCT, the tree-structured multi-armed bandit algorithm originally introduced in .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Bandit-based Monte-Carlo structure learning of probabilistic logic programs

2015

View full text Add to dashboard Cite

Probabilistic logic programming can be used to model domains with complex and uncertain relationships among entities. While the problem of learning the parameters of such programs has been considered by various authors, the problem of learning the structure is yet to be explored in depth. In this work we present an approximate search method based on a one-player game approach, called LEMUR. It sees the problem of learning the structure of a probabilistic logic program as a multi-armed bandit problem, relying on the Monte-Carlo tree search UCT algorithm that combines the precision of tree search with the generality of random sampling. LEMUR works by modifying the UCT algorithm in a fashion similar to FUSE, that considers a finite unknown horizon and deals with the problem of having a huge branching factor. The proposed system has been tested on various real-world datasets and has shown good performance with respect to other state of the art statistical relational learning approaches in terms of classification abilities.

show abstract

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Jung

Wehenkel

Ernst

et al. 2013

Adaptive Control & Signal

View full text Add to dashboard Cite

Direct policy search (DPS) and look-ahead tree (LT) policies are two widely used classes of techniques to produce high performance policies for sequential decision-making problems. To make DPS approaches work well, one crucial issue is to select an appropriate space of parameterized policies with respect to the targeted problem. A fundamental issue in LT approaches is that, to take good decisions, such policies must develop very large look-ahead trees which may require excessive online computational resources. In this paper, we propose a new hybrid policy learning scheme that lies at the intersection of DPS and LT, in which the policy is an algorithm that develops a small look-ahead tree in a directed way, guided by a node scoring function that is learned through DPS. The LT-based representation is shown to be a versatile way of representing policies in a DPS scheme, while at the same time, DPS enables to significantly reduce the size of the look-ahead trees that are required to take high-quality decisions.We experimentally compare our method with two other state-of-the-art DPS techniques and four common LT policies on four benchmark domains and show that it combines the advantages of the two techniques from which it originates. In particular, we show that our method: (1) produces overall better performing policies than both pure DPS and pure LT policies, (2) requires a substantially smaller number of policy evaluations than other DPS techniques, (3) is easy to tune and (4) results in policies that are quite robust with respect to perturbations of the initial conditions.

show abstract

Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm

Cited by 15 publications

References 28 publications

Sample Evaluation for Action Selection in Monte Carlo Tree Search

Sample Evaluation for Action Selection in Monte Carlo Tree Search

Bandit-based Monte-Carlo structure learning of probabilistic logic programs

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Contact Info

Product

Resources

About