Coevolutionary Temporal Difference Learning for Othello

Szubert, Marcin; Jaśkowski, Wojciech; Krawiec, Krzysztof

doi:10.1109/cig.2009.5286486

Cited by 24 publications

(24 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To quote Arthur Lee Samuel's declaration, The temptation to improve the machine's game by giving it standard openings or other man-generated knowledge of playing techniques has been consistently resisted (Samuel, 1959, p. 215). This result confirms our former observations (Szubert et al, 2009), when we demonstrated that hybridizing coevolution with TD(0) proves beneficial when learning the strategy of the game of Othello. Here, we come to similar conclusions for the game of small-board Go, and additionally note that extending the lookahead horizon by using TD(λ) with λ close to 1 can boost the performance of CTDL even further.…”

Section: Resultssupporting

confidence: 80%

“…4) speeds up the learning, the difference, initially substantial, becomes rather negligible after several hundreds of thousands of training games. Based on these results, we conclude that CTDL+HoF is moderately sensitive to the TDL-CEL ratio and recommend values greater than 8 for this parameter, which confirms our earlier findings for Othello (Szubert et al, 2009).…”

Section: Determining the Best Tdl-cel Ratiosupporting

confidence: 78%

“…We reported our first results with CTDL in the work by Szubert et al (2009), where it was applied to learn strategies of the game of Othello. The overall conclusion was positive for CDTL, which produced strategies that, on average, defeated those learned by TDL and CEL.…”

Section: 4mentioning

confidence: 99%

“…For experiments, five methods were prepared, each being a combination of techniques described in the previous section: CEL, TDL and HoF. Wherever possible, parameters taken directly from our previous comparison of the same set of methods (Szubert et al, 2009) were used. Detailed settings follow.…”

Section: 1mentioning

confidence: 99%

See 3 more Smart Citations

Evolving small-board Go players using coevolutionary temporal difference learning with archives

Krawiec¹,

Jaśkowski²,

Szubert³

2011

International Journal of Applied Mathematics and Computer Science

Self Cite

View full text Add to dashboard Cite

We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL's sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.

show abstract

Section: Resultssupporting

confidence: 80%

Section: Determining the Best Tdl-cel Ratiosupporting

confidence: 78%

Section: 4mentioning

confidence: 99%

Section: 1mentioning

confidence: 99%

See 2 more Smart Citations

Evolving small-board Go players using coevolutionary temporal difference learning with archives

Krawiec¹,

Jaśkowski²,

Szubert³

2011

International Journal of Applied Mathematics and Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…The heuristics of the tested implementations were the same, incorporating the weighted piece counter and mobility metric as components of the evaluation function [5]. The weight matrix used for piece evalution was the one obtained with coevolution by Szubert [20]. The same weight matrix was also used in move ordering to select the first move in the PV-nodes on the CPU.…”

Section: Testing Setupmentioning

confidence: 99%

Parallel Alpha-Beta Algorithm on the GPU

Strnad

Guid

2011

CIT

View full text Add to dashboard Cite

In the paper we present the parallel implementation of the alpha-beta algorithm running on the graphics processing unit (GPU). We compare the speed of the parallel player with the standard serial one using the game of reversi with boards of different sizes. We show that for small boards the level of available parallelism is insufficient for efficient GPU utilization, but for larger boards substantial speed-ups can be achieved on the GPU. The results indicate that the GPU-based alpha-beta implementation would be advantageous for similar games of higher computational complexity (e.g. hex and go) in their standard form.

show abstract