ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

Tian, Yuandong; Ma, Jerry; Gong, Qucheng; Sengupta, Shubho; Chen, Zhuoyuan; Pinkerton, James; Zitnick, C. Lawrence

doi:10.48550/arxiv.1902.04522

Cited by 11 publications

(14 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is not clear a priori whether the approach scales down in terms of resources. Although cheaper in several ways, the agents in (Silver et al 2018), (Schrittwieser et al 2019), (Tian et al 2019), and(Lee et al 2019) still use thousands of GPU's or hundreds of TPU's to master board games. The recent KataGo (Wu 2019) reaches the level of ELF using 1/50 of the computation and implements several techniques to accelerate the learning.…”

Section: Related Workmentioning

confidence: 99%

“…The amount of computational and financial resources that were required was so huge as to be out of reach for most, if not all, academic institutions. Not coincidentally these well-endowed projects and their follow-ups took place within giant multinational corporations of the IT sector (Tian et al 2019;Lee et al 2019). These companies deployed GPU's by the thousands and hundreds of TPU's.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune

Norelli¹,

Panconesi²

2021

Preprint

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune

Norelli¹,

Panconesi²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…While our implementation was heavily influenced by several different open-source AlphaZero implementations [22], [33]- [35], our unusual use-case -training small agents on small boards -lead to some unusual design decisions.…”

Section: A Alphazero Implementationmentioning

confidence: 99%

“…1) Small networks: The original AlphaZero and its opensource replications used very large residual convnets. ELF OpenGo [35], for example, uses a 256-filter 20-block convolutional network, weighing in at roughly 20m parameters and 2 GF-s for a forward pass on a single sample. In our preliminary work however, we found that on the small boards we work with, far smaller -and faster -networks could make it to perfect play.…”

Section: A Alphazero Implementationmentioning

confidence: 99%

Scaling Scaling Laws with Board Games

Jones

2021

Preprint

View full text Add to dashboard Cite

The largest experiments in machine learning now require resources far beyond the budget of all but a few institutions. Fortunately, it has recently been shown that the results of these huge experiments can often be extrapolated from the results of a sequence of far smaller, cheaper experiments. In this work, we show that not only can the extrapolation be done based on the size of the model, but on the size of the problem as well. By conducting a sequence of experiments using AlphaZero and Hex, we show that the performance achievable with a fixed amount of compute degrades predictably as the game gets larger and harder. Along with our main result, we further show that the test-time and train-time compute available to an agent can be traded off while maintaining performance.

show abstract

“…Recent advances in deep reinforcement learning (RL) have given rise to systems that can outperform human experts at variety of games (Silver et al, 2017;Tian et al, 2019;OpenAI, 2018). These advances, even more-so than those from supervised learning, rely on significant numbers of training samples, making them impractical without large-scale, distributed parallelization.…”

Section: Introductionmentioning

confidence: 99%

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

Wijmans,

Kadian,

Morcos

et al. 2019

Preprint

View full text Add to dashboard Cite

We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever 'stale'), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim (Savva et al., 2019), DD-PPO exhibits near-linear scaling -achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) -over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially 'solves' the task -near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks -the analog of 'ImageNet pre-training + task-specific fine-tuning' for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models and code are publicly available).

show abstract

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

Cited by 11 publications

References 0 publications

OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune

OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune

Scaling Scaling Laws with Board Games

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

Contact Info

Product

Resources

About