Online Adaptable Learning Rates for the Game Connect-4

Bagheri, Samineh; Thill, Markus; Koch, Peter; Konen, Wolfgang

doi:10.1109/tciaig.2014.2367105

Cited by 18 publications

(27 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For linear function approximation, the list of online adaptive learning algorithms include Beal's Temporal Coherence ( [3]), Dabney's α-bounds [9], Sutton's IDBD [29], and Mahmood's Autostep [16]. A thorough comparison of these and other methods has recently been performed by Bagheri et al [2] on Connect 4. It was shown that, while the learning rate adaptation methods clearly outperform the standard TD, the difference between them is not substantial, especially, in the long run.…”

Section: B Automatic Adaptive Learning Ratementioning

confidence: 99%

“…The algorithm used the standard TD(0) rule [30] to learn the weights of an n-tuple network W. Jaśkowski is with the Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60965 Poznań, Poland and with the Swiss AI Lab IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland email: wjaskowski@cs.put.poznan.pl 1 2048 itself is a derivative of the games 1024 and Threes. 2 as of March 2016 [15] approximating the afterstate-value function. In this paper, we extend this method in several directions.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mastering 2048 With Delayed Temporal Coherence Learning, Multistage Weight Promotion, Redundant Encoding, and Carousel Shaping

Jaśkowski

2018

IEEE Trans. Games

View full text Add to dashboard Cite

Abstract-2048 is an engaging single-player nondeterministic video puzzle game, which, thanks to the simple rules and hard-to-master gameplay, has gained massive popularity in recent years. As 2048 can be conveniently embedded into the discrete-state Markov decision processes framework, we treat it as a testbed for evaluating existing and new methods in reinforcement learning. With the aim to develop a strong 2048 playing program, we employ temporal difference learning with systematic n-tuple networks. We show that this basic method can be significantly improved with temporal coherence learning, multi-stage function approximator with weight promotion, carousel shaping, and redundant encoding. In addition, we demonstrate how to take advantage of the characteristics of the n-tuple network, to improve the algorithmic effectiveness of the learning process by delaying the (decayed) update and applying lock-free optimistic parallelism to effortlessly make advantage of multiple CPU cores. This way, we were able to develop the best known 2048 playing program to date, which confirms the effectiveness of the introduced methods for discrete-state Markov decision problems.

show abstract

Section: B Automatic Adaptive Learning Ratementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Mastering 2048 With Delayed Temporal Coherence Learning, Multistage Weight Promotion, Redundant Encoding, and Carousel Shaping

Jaśkowski

2018

IEEE Trans. Games

View full text Add to dashboard Cite

show abstract

“…An empirical study to evaluate playing quality is conducted and concludes that new future video encoding adaption strategies can be recommended. In [18], a variant of Temporal Coherence Learning with geometric step size changing was proposed. They showed its algorithm outperformed other ones with constant change of step size.…”

Section: Previous Workmentioning

confidence: 99%

Efficient Cloud Gaming Scheme Using Scene Objects Adaptation

Mazhar¹

2017

IJCSIT

View full text Add to dashboard Cite

show abstract

“…Many methods were proposed to design AI programs for 2048 and Threes in the past. Most commonly used methods 1 were alpha-beta search [10][15] [18], a traditional game search method for two-player games, and expectimax search [2][12] [18], a common game search method for single-player stochastic games. Recently, Szubert and Jaśkowski [21] proposed Temporal Difference (TD) learning together with n-tuple networks for 2048.…”

Section: Introductionmentioning

confidence: 99%