Non-Asymptotic Analysis of Monte Carlo Tree Search

ShahDevavrat,; XieQiaomin,; XuZhi,

doi:10.1145/3379476

“…When the base algorithm has convergence guarantees, such as UCRL, we can additionally provide guarantees on the rate of convergence. We provide these rates and a discussion of the UCRL case in the appendix: our analysis drawns upon the analysis of convergence rates for Monte Carlo Tree Search from Shah et al (Shah, Xie, and Xu 2020).…”

Section: Brief Theoretical Discussionmentioning

confidence: 99%

“…In the case where the learning algorithms under constraints have regret guarantees, such as UCRL, we can closely follow the techniques of Shah et al (Shah, Xie, and Xu 2020) to provide a concentration property. In this subsection we will first show this generally and then provide discussion for UCRL.…”

Section: Convergence When the Algorithms Have Guarantees On The Rates...mentioning

confidence: 99%

Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

Mu¹,

Theocharous²,

Arbour³

et al. 2021

Preprint

0

View full text Add to dashboard Cite

Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy. It takes in multiple potential policy constraints to maintain robustness to misspecification of individual constraints while leveraging helpful ones to learn quickly. Given a base RL learning algorithm (ex. UCRL, DQN, Rainbow) we propose an upper confidence with elimination scheme that leverages the relationship between the constraints, and their observed performance, to adaptively switch among them. We instantiate our algorithm with DQN-type algorithms and UCRL as base algorithms, and evaluate our algorithm in four environments, including three simulators based on real data: recommendations, educational activity sequencing, and HIV treatment sequencing. In all cases, CSRL learns a good policy faster than baselines.

show abstract

“…|I| k ] that maps joint observation, z, to joint action, a. Adapting the proof concept in [6] to our setting, the policy improvement can be shown by validating the following two properties and then iterating: (11) and, after generating an appropriate dataset, (ii) learning…”

Section: A Meta Self-improving Algorithmmentioning

confidence: 99%

“…MCTS frequently uses the Upper Confidence Bound for Trees algorithm [4] that uses a discrete-action, multi-armed bandit solution [5] to balance exploration and exploitation in node selection. Recent work uses a non-stationary bandit analysis to propose a polynomial, rather than logarithmic, exploration term [6].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

Rivière

¹

,

Hönig

²

,

Anderson

³

et al. 2021

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

We present a self-improving, neural tree expansion method for multi-robot online planning in non-cooperative environments, where each robot tries to maximize its cumulative reward while interacting with other self-interested robots. Our algorithm adapts the centralized, perfect information, discreteaction space method from Alpha Zero to a decentralized, partial information, continuous action space setting for multi-robot applications. Our method has three interacting components: (i) a centralized, perfect-information "expert" Monte Carlo Tree Search (MCTS) with large computation resources that provides expert demonstrations, (ii) a decentralized, partialinformation "learner" MCTS with small computation resources that runs in real-time and provides self-play examples, and (iii) policy & value neural networks that are trained with the expert demonstrations and bias both the expert and the learner tree growth. Our numerical experiments demonstrate neural expansion generates compact search trees with better solution quality and 20 times less computational expense compared to MCTS without neural expansion. The resulting policies are dynamically sophisticated, demonstrate coordination between robots, and play the Reach-Target-Avoid differential game significantly better than the state-of-the-art control-theoretic baseline for multi-robot, double-integrator systems. Our hardware experiments on an aerial swarm demonstrate the computational advantage of neural tree expansion, enabling online planning at 20 Hz with effective policies in complex scenarios.

show abstract

Constraint Sampling Reinforcement Learning: Incorporating Expertise for Faster Learning

Mu

¹

,

Theocharous

²

,

Arbour

³

et al. 2022

AAAI

View full text Add to dashboard Cite

Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy. It takes in multiple potential policy constraints to maintain robustness to misspecification of individual constraints while leveraging helpful ones to learn quickly. Given a base RL learning algorithm (ex. UCRL, DQN, Rainbow) we propose an upper confidence with elimination scheme that leverages the relationship between the constraints, and their observed performance, to adaptively switch among them. We instantiate our algorithm with DQN-type algorithms and UCRL as base algorithms, and evaluate our algorithm in four environments, including three simulators based on real data: recommendations, educational activity sequencing, and HIV treatment sequencing. In all cases, CSRL learns a good policy faster than baselines.

show abstract

Non-Asymptotic Analysis of Monte Carlo Tree Search

Cited by 8 publications

References 0 publications

Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

Constraint Sampling Reinforcement Learning: Incorporating Expertise for Faster Learning

Contact Info

Product

Resources

About