Mimicking Go Experts with Convolutional Neural Networks

Sutskever, Ilya; Nair, Varun Sasidharan

doi:10.1007/978-3-540-87559-8_11

Cited by 29 publications

(22 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most move predictors for Go are either using Neural Networks [6,7] or are estimating ratings for moves using the Bradley Terry (BT) model or related models [3,4,8]. Latter mentioned approaches model each move decision as a competition between players, the move chosen by the human expert player is then the winning player and its value is updated accordingly.…”

Section: Related Workmentioning

confidence: 99%

Move Prediction in Go – Modelling Feature Interactions Using Latent Factors

Wistuba

Schmidt-Thieme

2013

KI 2013: Advances in Artificial Intelligence

View full text Add to dashboard Cite

Abstract. Move prediction systems have always been part of strong Go programs. Recent research has revealed that taking interactions between features into account improves the performance of move predictions. In this paper, a factorization model is applied and a supervised learning algorithm, Latent Factor Ranking (LFR), which enables to consider these interactions, is introduced. Its superiority will be demonstrated in comparison to other state-of-the-art Go move predictors. LFR improves accuracy by 3% over current state-of-the-art Go move predictors on average and by 5% in the middle-and endgame of a game. Depending on the dimensionality of the shared, latent factor vector, an overall accuracy of over 41% is achieved.

show abstract

Section: Related Workmentioning

confidence: 99%

Move Prediction in Go – Modelling Feature Interactions Using Latent Factors

Wistuba

Schmidt-Thieme

2013

KI 2013: Advances in Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…Finally, two-layer neural networks were constructed, and the accuracy of the evaluation was as high as 25%. In 2008, Sutskever and Nair [33] also constructed a two-layer neural network and used the soft Max layer in the last layer to predict moves. The prediction accuracy corresponded to 37% on GoGoGoD dataset.…”

Section: Related Workmentioning

confidence: 99%

Improved Online Sequential Extreme Learning Machine: A New Intelligent Evaluation Method for AZ-Style Algorithms

Wei

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Researches on computer games for Go, Chess, and Japanese Chess stand out as one of the notable landmarks in the progress of artificial intelligence. AlphaGo, AlphaGo Zero, and AlphaZero algorithms, which are called AlphaZero style (AZ-style) algorithms in some literature [1], have achieved superhuman performance by using deep reinforcement learning (DRL). However, the unavailability of training details, expensive equipment used for model training, and the low evaluation accuracy resulted by slow self-play training without expensive computing equipment in practical applications have been the defects of AZ-style algorithms. To solve the problems to a certain extent, the paper proposes an improved online sequential extreme learning machine (IOS-ELM), a new evaluation method, to evaluate chess board positions for AZ-style algortihm. Firstly, the theoretical principles of IOS-ELM is given. Secondly, the study considers Gomoku as the application object and uses IOS-ELM as the evaluation method for AZ-style's board positions to discuss the loss in the training process and hyperparameters affecting performance in detail. Under the same experimental conditions, the proposed method reduces the training parameters by 14 times, training time to 15%, and error of evaluation by 13% compared with the board evaluation network used in original AZ-style algorithms.INDEX TERMS Artificial intelligence, deep reinforcement learning, online sequential extreme learning machine, evaluation method, AlphaZero.

show abstract

“…The game of GO is a very popular research framework and some methods incorporate knowledge from records of expert players. Sutskever and Nair [7] train a Convolutional Neural Network over professional games in order to predict how experts play the game, beating the state of the art. Other approaches [16] use a complex combination of online learning, transient learning, expert knowledge and patterns learned offline.…”

Section: B Similar Models For Learning In Complex Gamesmentioning

confidence: 99%

“…Yet, many agents created to date are the result of extensively analysing what a human would do when faced with the same set of options. There are many techniques for using human knowledge, including: supervised learning through human feedback [6], modelling human opponents to guide the sampling [7], offline learning of a policy from expert play [8], inverse reinforcement learning from human demonstration [9] and writing hand-crafted heuristics based on expert advice [10] (this is also used to inform different search algorithms [5]). These have generally increased the level of play of the deployed agent, but either require a lot of development work or are expensive to run.…”

Section: Introductionmentioning

confidence: 99%