This paper proposes a new approach to a novel value network architecture for the game Go, called a multi-labelled (ML) value network. In the ML value network, different values (win rates) are trained simultaneously for different settings of komi, a compensation given to balance the initiative of playing first. The ML value network has three advantages, (a) it outputs values for different komi, (b) it supports dynamic komi, and (c) it lowers the mean squared error (MSE). This paper also proposes a new dynamic komi method to improve game-playing strength.This paper also performs experiments to demonstrate the merits of the architecture. First, the MSE of the ML value network is generally lower than the value network alone. Second, the program based on the ML value network wins by a rate of 67.6% against the program based on the value network alone. Third, the program with the proposed dynamic komi method significantly improves the playing strength over the baseline that does not use dynamic komi, especially for handicap games. To our knowledge, up to date, no handicap games have been played openly by programs using value networks. This paper provides these programs with a useful approach to playing handicap games.Although the rules of Go are simple, its game tree complexity is extremely high, estimated to be 10 360 in [1] [40]. It is common for players with different strengths to play ℎ-stone handicap games, where the weaker player, usually designated to play as black, is allowed to place ℎ stones 2 first with a komi of 0.5 before white makes the first move. If the strength difference (rank difference) between both players is large, more handicap stones are usually given to the weaker player.In the past, computer Go was listed as one of the AI grand challenges [16][28]. By 2006, the strengths of computer Go programs were generally below 6 kyu [5][8][14], far away from amateur dan players. In 2006, Monte Carlo tree search (MCTS) [6][11][15][23][37] was invented and computer Go programs started making significant progress [4][10][13], roughly up to 6 dan in 2015. In 2016, this grand challenge was achieved by the program AlphaGo [34] when it defeated (4:1) Lee Sedol, a 9 dan grandmaster who had won the most world Go champion titles in the past decade. Many thought at the time there would be a decade or more away from surpassing this milestone. Up to date, DeepMind, the team behind AlphaGo, had published the techniques and methods of AlphaGo in Nature [34]. AlphaGo was able to surpass experts' expectations by proposing a new method that uses three deep convolutional neural networks (DCNNs) [24][25]: a supervised learning (SL) policy network [7][9][18][26][38] learning to predict experts' moves from human expert game records, a reinforcement learning (RL) policy network [27] improving the SL policy network via self-play, and a value network that performs state evaluation based on self-play game simulations. AlphaGo then combined the DCNNs with MCTS for move generation during game play. In MCTS, a fast rollout policy was...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.