NeuroHex: A Deep Q-learning Hex Agent

Young, Kenny; Vasan, Gautham; Hayward, Ryan B.

doi:10.1007/978-3-319-57969-6_1

Cited by 9 publications

(24 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MoHex2.0 plays the next move based on the playout simulations, which improves the quality using expert game records. As a result, a CNN has been proposed to create more accurate evaluation functions [24], [25]. The CNN is expected to learn position features that cannot be represented by network characteristics.…”

Section: B Conventional Methods To Build Value Functionsmentioning

confidence: 99%

“…Several reinforcement-learning algorithms that train value and policy functions independently have been proposed [8], Places corresponding to three mutually adjacent cells patterns become one. [24], and it has been demonstrated that CNNs provide greater evaluation accuracy than that of classical evaluation functions. In addition, the reinforcement-learning algorithm called Expert Iteration(ExIt), which trains two functions, has been proposed; the effectiveness of ExIt is shown on a 9×9 board [26].…”

Section: B Conventional Methods To Build Value Functionsmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

2020

View full text Add to dashboard Cite

Recently, the use of reinforcement-learning algorithms has been proposed to create value and policy functions, and their effectiveness has been demonstrated using Go, Chess, and Shogi. In previous studies, the policy function is trained to predict the search probabilities of each move output by Monte Carlo Tree Search; thus, a number of simulations are required to obtain the search probabilities. We propose a reinforcementlearning algorithm with game of self-play to create value and policy functions such that the policy function is trained directly from the game results without the search probabilities. In this study, we use Hex, a board game developed by Piet Hein, to evaluate the proposed method. We demonstrate the effectiveness of the proposed learning algorithm in terms of the policy function accuracy, and play a tournament with the proposed computer Hex algorithm DeepEZO and 2017 world-champion programs. The tournament results demonstrate that DeepEZO outperforms all programs. DeepEZO achieved a winning percentage of 79.3% against the world-champion program MoHex2.0 under the same search conditions on 13×13 board. We also show that the highly accurate policy functions can be created by training the policy functions to increase the number of moves to be searched in the loser position.

show abstract

Section: B Conventional Methods To Build Value Functionsmentioning

confidence: 99%

Section: B Conventional Methods To Build Value Functionsmentioning

confidence: 99%

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

2020

View full text Add to dashboard Cite

show abstract

“…Deep Q-Learning differs from the other methods mentioned earlier as Q-learning is not based on the search method. The use of Q-Learning on two-player board games has been studied before in several different types of games [9,15,[21][22][23][24]. In [22], Deep Q-Learning still found it difficult to beat the searching method in the Hex game even though it has been trained for two weeks (about 60,000 episodes).…”

Section: Introductionmentioning

confidence: 99%

“…The use of Q-Learning on two-player board games has been studied before in several different types of games [9,15,[21][22][23][24]. In [22], Deep Q-Learning still found it difficult to beat the searching method in the Hex game even though it has been trained for two weeks (about 60,000 episodes). The long training process is also shown in [24].…”

Section: Introductionmentioning

confidence: 99%

Maintain Agent Consistency in Surakarta Chess Using Dueling Deep Network With Increasing Batch

Rajagede

2022

IIUMEJ

View full text Add to dashboard Cite

Deep reinforcement learning usage in creating intelligent agents for various tasks has shown outstanding performance, particularly the Q-Learning algorithm. Deep Q-Network (DQN) is a reinforcement learning algorithm that combines the Q-Learning algorithm and deep neural networks as an approximator function. In the single-agent environment, the DQN model successfully surpasses human ability several times over. Still, when there are other agents in the environment, DQN may experience decreased performance. This research evaluated a DQN agent to play in the two-player traditional board game of Surakarta Chess. One of the drawbacks that we found when using DQN in two-player games is its consistency. The agent will experience performance degradation when facing different opponents. This research shows Dueling Deep Q-Network usage with increasing batch size can improve the agent's performance consistency. Our agent trained against a rule-based agent that acts based on the Surakarta Chess positional properties and was then evaluated using different rule-based agents. The best agent used Dueling DQN architecture with increasing batch size that produced a 57% average win rate against ten different agents after training for a short period. ABSTRAK: Pembelajaran Peneguhan Mendalam adalah terbaik apabila digunakan bagi mewujudkan ejen pintar dalam menyelesaikan pelbagai tugasan, terutama jika ia melibatkan algoritma Pembelajaran-Q. Algoritma Rangkaian-Q Mendalam (DQN) adalah Pembelajaran Peneguhan berasaskan gabungan algoritma Pembelajaran-Q dan rangkaian neural sebagai fungsi penghampiran. Melalui persekitaran ejen tunggal, model DQN telah beberapa kali berjaya mengatasi kemampuan manusia. Namun, ketika ejen lain berada dalam persekitaran ini, DQN mungkin kurang berjaya. Kajian ini melibatkan ejen DQN bermain papan tradisional iaitu Catur Surakarta dengan dua pemain. Salah satu kekurangan yang dijumpai adalah konsistensi. Ejen ini akan kurang bagus ketika berhadapan lawan berbeza. Kajian menunjukkan dengan penggunaan Rangkaian-Q Dwipertarungan Mendalam bersama peningkatan saiz kumpulan dapat meningkatkan konsistensi prestasi ejen. Ejen ini telah dilatih untuk melawan ejen lain berasaskan peraturan dan sifat kedudukan Catur Surakarta. Kemudian, ejen ini diuji berpandukan peraturan berbeza. Ejen terbaik adalah yang menggunakan rekaan DQN Dwipertarungan bersama peningkatan saiz kumpulan. Ianya berhasil memenangi permainan dengan purata 57% berbanding sepuluh agen lain melalui latihan jangka masa pendek.

show abstract

“…Recent studies show that compared with traditional rectangle-based CNN models, CNN models with hexagonshaped filters achieve better performance in applications such as Imaging Atmospheric Cherenkov Telescope (IACT) data analysis [2], [19], [20], [23], Hex move-prediction [27], and IceCube data analysis [8]. Applying hexagonal filters in group CNNs can even surpass the performance of traditional CNN models with image classification tasks on data sets such as CIFAR-10 [6], [22], [26].…”

Section: Introductionmentioning

confidence: 99%

HexCNN: A Framework for Native Hexagonal Convolutional Neural Networks

Zhao

Korn

et al. 2020

2020 IEEE International Conference on Data Mining (ICDM)

View full text Add to dashboard Cite

Hexagonal CNN models have shown superior performance in applications such as IACT data analysis and aerial scene classification due to their better rotation symmetry and reduced anisotropy. In order to realize hexagonal processing, existing studies mainly use the ZeroOut method to imitate hexagonal processing, which causes substantial memory and computation overheads. We address this deficiency with a novel native hexagonal CNN framework named HexCNN. HexCNN takes hexagon-shaped input and performs forward and backward propagation on the original form of the input based on hexagonshaped filters, hence avoiding computation and memory overheads caused by imitation. For applications with rectangle-shaped input but require hexagonal processing, HexCNN can be applied by padding the input into hexagon-shape as preprocessing. In this case, we show that the time and space efficiency of HexCNN still outperforms existing hexagonal CNN methods substantially. Experimental results show that compared with the state-ofthe-art models, which imitate hexagonal processing but using rectangle-shaped filters, HexCNN reduces the training time by up to 42.2%. Meanwhile, HexCNN saves the memory space cost by up to 25% and 41.7% for loading the input and performing convolution, respectively.

show abstract

NeuroHex: A Deep Q-learning Hex Agent

Cited by 9 publications

References 15 publications

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

Maintain Agent Consistency in Surakarta Chess Using Dueling Deep Network With Increasing Batch

HexCNN: A Framework for Native Hexagonal Convolutional Neural Networks

Contact Info

Product

Resources

About