2017
DOI: 10.48550/arxiv.1712.01815
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

David Silver,
Thomas Hubert,
Julian Schrittwieser
et al.

Abstract: The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
424
0
3

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 318 publications
(427 citation statements)
references
References 7 publications
0
424
0
3
Order By: Relevance
“…Deep Reinforcement Learning (RL) has gained many successes against humans in competitive games, such as Go [41], Dota [31], and StarCraft [48]. However, it still remains a challenge to build AI agents that can coordinate and collaborate with humans that the agents have not seen during training [20,24,5,40,18,21].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Deep Reinforcement Learning (RL) has gained many successes against humans in competitive games, such as Go [41], Dota [31], and StarCraft [48]. However, it still remains a challenge to build AI agents that can coordinate and collaborate with humans that the agents have not seen during training [20,24,5,40,18,21].…”
Section: Introductionmentioning
confidence: 99%
“…The mainstream method for building state-of-the-art AI agents is through self-play RL [44,41]. Self-play-trained agents are very specialized, and therefore suffered significantly from distributional shift when paired with humans.…”
Section: Introductionmentioning
confidence: 99%
“…Over the last few years, RL, grounded on combining classical theoretical results with deep learning and the functional approximation paradigm, has proved to be a fruitful approach to many artificial intelligence tasks from diverse domains. Breakthrough achievements include reaching human-level performance in such complex games as Go [184], and multi-player StarCraft II [206]. The generality of the reinforcement learning framework allows its application in both discrete and continuous spaces to solve tasks in both real and simulated environments [138].…”
Section: The Basics Of Reinforcement Learningmentioning
confidence: 99%
“…The rule scheduling problem could be formulated as a reinforcement learning problem where the rule scheduler (the agent) in a given e-graph and its history (the environment) has to decide at a given iteration if a rule should be applied or not (the possible actions). Game mastering reinforcement learning techniques such as AlphaZero [49] or MuZero [47] could be adopted by viewing the rule scheduling problem as a game that the agent can win, especially in an automated theorem proving context. Viewing e-graph rule scheduling as a reinforcement learning task is thus a very interesting open research problem that, together with a proof production algorithm (subsection 4.1.2), may yield groundbreaking results in many applications such as automated theorem proving (subsection 4.3.1), compiler optimizations and symbolic mathematics whereas equality saturation with classical heuristics may take a very long time to optimize and prove equalities in the presence of very large input expressions and systems of equational rewrite rules.…”
Section: Smart Rule Scheduling Heuristicsmentioning
confidence: 99%