Yotam Doron scite author profile

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in singlemachine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach. The source code is publicly available at github.com/deepmind/scalable agent.

show abstract

Deep Reinforcement Learning and the Deadly Triad

Hasselt¹,

Doron²,

Strub³

et al. 2018

Preprint

View full text Add to dashboard Cite

dm_control: Software and tasks for continuous control

Tassa¹,

Tunyasuvunakool²,

Muldal³

et al. 2020

Software Impacts

103

View full text Add to dashboard Cite

The dm_control software package is a collection of Python libraries and task suites for reinforcement learning agents in an articulated-body simulation. Infrastructure includes a wrapper for the MuJoCo physics engine and libraries for procedural model manipulation and task authoring. Task suites include the Control Suite, a set of standardized tasks intended to serve as performance benchmarks, a locomotion framework and task families, and a set of manipulation tasks with a robot arm and snap-together bricks. An adjunct tech report and interactive tutorial are also provided.

show abstract

Behaviour Suite for Reinforcement Learning

Osband¹,

Doron²,

Hessel³

et al. 2019

Preprint

View full text Add to dashboard Cite

This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to study agent behaviour through their performance on these shared benchmarks. To complement this effort, we open source github.com/deepmind/bsuite, which automates evaluation and analysis of any agent on bsuite. This library facilitates reproducible and accessible research on the core issues in RL, and ultimately the design of superior learning algorithms. Our code is Python, and easy to use within existing projects. We include examples with OpenAI Baselines, Dopamine as well as new reference implementations. Going forward, we hope to incorporate more excellent experiments from the research community, and commit to a periodic review of bsuite from a committee of prominent researchers.

show abstract

Transformation-based Adversarial Video Prediction on Large-Scale Data

Luc¹,

Clark²,

Dieleman³

et al. 2020

Preprint

View full text Add to dashboard Cite

Recent breakthroughs in adversarial generative modeling have led to models capable of producing video samples of high quality, even on large and complex datasets of real-world video. In this work, we focus on the task of video prediction, where given a sequence of frames extracted from a video, the goal is to generate a plausible future sequence. We first improve the state of the art by performing a systematic empirical study of discriminator decompositions and proposing an architecture that yields faster convergence and higher performance than previous approaches. We then analyze recurrent units in the generator, and propose a novel recurrent unit which transforms its past hidden state according to predicted motion-like features, and refines it to to handle dis-occlusions, scene changes and other complex behavior. We show that this recurrent unit consistently outperforms previous designs. Our final model leads to a leap in the state-ofthe-art performance, obtaining a test set Fréchet Video Distance of 25.7, down from 69.2, on the large-scale Kinetics-600 dataset.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yotam Doron

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Deep Reinforcement Learning and the Deadly Triad

dm_control: Software and tasks for continuous control

Behaviour Suite for Reinforcement Learning

Transformation-based Adversarial Video Prediction on Large-Scale Data

Contact Info

Product

Resources

About