2016
DOI: 10.1109/tciaig.2014.2367105
|View full text |Cite
|
Sign up to set email alerts
|

Online Adaptable Learning Rates for the Game Connect-4

Abstract: Learning board games by self-play has a long tradition in computational intelligence for games. Based on Tesauro's seminal success with TD-Gammon in 1994, many successful agents use temporal difference learning today. But in order to be successful with temporal difference learning on game tasks, often a careful selection of features and a large number of training games is necessary. Even for board games of moderate complexity like Connect-4, we found in previous work that a very rich initial feature set and se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
26
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(27 citation statements)
references
References 20 publications
1
26
0
Order By: Relevance
“…For linear function approximation, the list of online adaptive learning algorithms include Beal's Temporal Coherence ( [3]), Dabney's α-bounds [9], Sutton's IDBD [29], and Mahmood's Autostep [16]. A thorough comparison of these and other methods has recently been performed by Bagheri et al [2] on Connect 4. It was shown that, while the learning rate adaptation methods clearly outperform the standard TD, the difference between them is not substantial, especially, in the long run.…”
Section: B Automatic Adaptive Learning Ratementioning
confidence: 99%
See 1 more Smart Citation
“…For linear function approximation, the list of online adaptive learning algorithms include Beal's Temporal Coherence ( [3]), Dabney's α-bounds [9], Sutton's IDBD [29], and Mahmood's Autostep [16]. A thorough comparison of these and other methods has recently been performed by Bagheri et al [2] on Connect 4. It was shown that, while the learning rate adaptation methods clearly outperform the standard TD, the difference between them is not substantial, especially, in the long run.…”
Section: B Automatic Adaptive Learning Ratementioning
confidence: 99%
“…The algorithm used the standard TD(0) rule [30] to learn the weights of an n-tuple network W. Jaśkowski is with the Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60965 Poznań, Poland and with the Swiss AI Lab IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland email: wjaskowski@cs.put.poznan.pl 1 2048 itself is a derivative of the games 1024 and Threes. 2 as of March 2016 [15] approximating the afterstate-value function. In this paper, we extend this method in several directions.…”
Section: Introductionmentioning
confidence: 99%
“…An empirical study to evaluate playing quality is conducted and concludes that new future video encoding adaption strategies can be recommended. In [18], a variant of Temporal Coherence Learning with geometric step size changing was proposed. They showed its algorithm outperformed other ones with constant change of step size.…”
Section: Previous Workmentioning
confidence: 99%
“…Many methods were proposed to design AI programs for 2048 and Threes in the past. Most commonly used methods 1 were alpha-beta search [10][15] [18], a traditional game search method for two-player games, and expectimax search [2][12] [18], a common game search method for single-player stochastic games. Recently, Szubert and Jaśkowski [21] proposed Temporal Difference (TD) learning together with n-tuple networks for 2048.…”
Section: Introductionmentioning
confidence: 99%