Tackling Morpion Solitaire with AlphaZero-like Ranked Reward Reinforcement Learning

Wang, Hui; Preuß, Mike; Emmerich, Michael; Plaat, Aske

doi:10.1109/synasc51798.2020.00033

Cited by 10 publications

(3 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More detailed results show that using semisplit in simulations gives a large efficiency boost and generally is not harmful in terms of their quality. This may be useful also apart from game playing, as simulations are applied to many single-player problems (Wang et al 2020). On the other hand, using semisplit in the MCTS tree gives some benefits in the quality of iterations, yet it is riskier, as sometimes it is consistently harmful (e.g., The Mill Game).…”

Section: Discussionmentioning

confidence: 99%

Split Moves for Monte-Carlo Tree Search

Kowalski

Mika

Pawlik

et al. 2022

AAAI

View full text Add to dashboard Cite

In many games, moves consist of several decisions made by the player. These decisions can be viewed as separate moves, which is already a common practice in multi-action games for efficiency reasons. Such division of a player move into a sequence of simpler / lower level moves is called splitting. So far, split moves have been applied only in forementioned straightforward cases, and furthermore, there was almost no study revealing its impact on agents' playing strength. Taking the knowledge-free perspective, we aim to answer how to effectively use split moves within Monte-Carlo Tree Search (MCTS) and what is the practical impact of split design on agents' strength. This paper proposes a generalization of MCTS that works with arbitrarily split moves. We design several variations of the algorithm and try to measure the impact of split moves separately on efficiency, quality of MCTS, simulations, and action-based heuristics. The tests are carried out on a set of board games and performed using the Regular Boardgames General Game Playing formalism, where split strategies of different granularity can be automatically derived based on an abstract description of the game. The results give an overview of the behavior of agents using split design in different ways. We conclude that split design can be greatly beneficial for single- as well as multi-action games.

show abstract

Section: Discussionmentioning

confidence: 99%

Split Moves for Monte-Carlo Tree Search

Kowalski

Mika

Pawlik

et al. 2022

AAAI

View full text Add to dashboard Cite

show abstract

“…We set parameters values according to Table 1. The parameter choices are based on [Wang et al, 2020a].…”

Section: Methodsmentioning

confidence: 99%

“…There are many interesting works on self-play in reinforcement learning [Tesauro, 1995;Runarsson and Lucas, 2005;Plaat, 2020]. Temporal difference learning for acquiring position evaluation in small board Go with co-evolution has been compared to self-play [Runarsson and Lucas, 2005].…”

Section: Related Workmentioning

confidence: 99%

Adaptive Warm-Start MCTS in AlphaZero-Like Deep Reinforcement Learning

Wang

Preuß

Plaat

2021

PRICAI 2021: Trends in Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

AlphaZero has achieved impressive performance in deep reinforcement learning by utilizing an architecture that combines search and training of a neural network in self-play. Many researchers are looking for ways to reproduce and improve results for other games/tasks. However, the architecture is designed to learn from scratch, tabula rasa, accepting a cold-start problem in self-play. Recently, a warmstart enhancement method for Monte Carlo Tree Search was proposed to improve the self-play starting phase. It employs a fixed parameter I to control the warm-start length. Improved performance was reported in small board games. In this paper we present results with an adaptive switch method. Experiments show that our approach works better than the fixed I , especially for "deep," tactical, games (Othello and Connect Four). We conjecture that the adaptive value for I is also influenced by the size of the game, and that on average I will increase with game size. We conclude that AlphaZero-like deep reinforcement learning benefits from adaptive rollout based warm-start, as Rapid Action Value Estimate did for rollout-based reinforcement learning 15 years ago.

show abstract