2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) 2020
DOI: 10.1109/synasc51798.2020.00033
|View full text |Cite
|
Sign up to set email alerts
|

Tackling Morpion Solitaire with AlphaZero-like Ranked Reward Reinforcement Learning

Abstract: Morpion Solitaire is a popular single player game, performed with paper and pencil. Due to its large state space (on the order of the game of Go) traditional search algorithms, such as MCTS, have not been able to find good solutions. A later algorithm, Nested Rollout Policy Adaptation, was able to find a new record of 82 steps, albeit with large computational resources. After achieving this record, to the best of our knowledge, there has been no further progress reported, for about a decade.In this paper we ta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…More detailed results show that using semisplit in simulations gives a large efficiency boost and generally is not harmful in terms of their quality. This may be useful also apart from game playing, as simulations are applied to many single-player problems (Wang et al 2020). On the other hand, using semisplit in the MCTS tree gives some benefits in the quality of iterations, yet it is riskier, as sometimes it is consistently harmful (e.g., The Mill Game).…”
Section: Discussionmentioning
confidence: 99%
“…More detailed results show that using semisplit in simulations gives a large efficiency boost and generally is not harmful in terms of their quality. This may be useful also apart from game playing, as simulations are applied to many single-player problems (Wang et al 2020). On the other hand, using semisplit in the MCTS tree gives some benefits in the quality of iterations, yet it is riskier, as sometimes it is consistently harmful (e.g., The Mill Game).…”
Section: Discussionmentioning
confidence: 99%
“…We set parameters values according to Table 1. The parameter choices are based on [Wang et al, 2020a].…”
Section: Methodsmentioning
confidence: 99%
“…There are many interesting works on self-play in reinforcement learning [Tesauro, 1995;Runarsson and Lucas, 2005;Plaat, 2020]. Temporal difference learning for acquiring position evaluation in small board Go with co-evolution has been compared to self-play [Runarsson and Lucas, 2005].…”
Section: Related Workmentioning
confidence: 99%