Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

Machado, Marlos C.; Bellemare, Marc G.; Talvitie, Erik; Veness, Joel; Hausknecht, Matthew; Bowling, Michael

doi:10.48550/arxiv.1709.06009

Cited by 15 publications

(28 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We empirically study the behavior of BDQN on a wide range of Atari games [16,44]. Since BDQN follows an efficient exploration-exploitation strategy, it reaches much higher cumulative rewards in fewer interactions, compared to its ε-greedy predecessor DDQN.…”

Section: Strategymentioning

confidence: 99%

Efficient Exploration Through Bayesian Deep Q-Networks

Azizzadenesheli¹,

Brunskill

Anandkumar

2018

2018 Information Theory and Applications Workshop (ITA)

View full text Add to dashboard Cite

We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism, LINUCB, and another based on posterior sampling, LINPSRL. We guarantee frequentist and Bayesian regret upper bounds of O(d √ T ) for these two algorithms, where T is the number of episodes. We extend these methods to deep RL and propose Bayesian deep Q-networks (BDQN), which uses an efficient Thompson sampling algorithm for high dimensional RL. We deploy the double DQN (DDQN) approach, and instead of learning the last layer of Q-network using linear regression, we use Bayesian linear regression, resulting in an approximated posterior over Q-function. This allows us to directly incorporate the uncertainty over the Q-function and deploy Thompson sampling on the learned posterior distribution resulting in efficient exploration/exploitation trade-off. We empirically study the behavior of BDQN on a wide range of Atari games. Since BDQN carries out more efficient exploration and exploitation, it is able to reach higher return substantially faster compared to DDQN.

show abstract

Section: Strategymentioning

confidence: 99%

Efficient Exploration Through Bayesian Deep Q-Networks

Azizzadenesheli¹,

Brunskill

Anandkumar

2018

2018 Information Theory and Applications Workshop (ITA)

View full text Add to dashboard Cite

show abstract

“…Ruder (2017) provides a survey of multi-task learning in general, which, different from our problem of interest, considers a fixed finite population of tasks. Finn et al (2017) (Brockman et al, 2016) (which we build upon), the Arcade Learning Environment (Bellemare et al, 2013;Machado et al, 2017), DeepMind Lab (Beattie et al, 2016), and VizDoom (Kempka et al, 2016). The MuJoCo physics simulator (Todorov et al, 2012) has been influential in standardizing a number of continuous control tasks.…”

Section: Related Workmentioning

confidence: 99%

Assessing Generalization in Deep Reinforcement Learning

Packer¹,

Gao²,

Kos³

et al. 2018

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (RL) has achieved breakthrough results on many tasks, but agents often fail to generalize beyond the environment they were trained in. As a result, deep RL algorithms that promote generalization are receiving increasing attention. However, works in this area use a wide variety of tasks and experimental setups for evaluation. The literature lacks a controlled assessment of the merits of different generalization schemes. Our aim is to catalyze communitywide progress on generalization in deep RL. To this end, we present a benchmark and experimental protocol, and conduct a systematic empirical study. Our framework contains a diverse set of environments, our methodology covers both indistribution and out-of-distribution generalization, and our evaluation includes deep RL algorithms that specifically tackle generalization. Our key finding is that "vanilla" deep RL algorithms generalize better than specialized schemes that were proposed specifically to tackle generalization.

show abstract

“…The low computation cost of Pong allows us to do an extensive study of the bias-variance of Q for different model-based planning and exploration strategies. (Even for this game, GATS requires multiple weeks of GPU process for a short run of 5M time steps) Experimental results For this work, we also developed a new OpenAI gym (Brockman et al, 2016)-like interface for the latest Atari Learning Environment (ALE) (Machado et al, 2017), which supports different modes and difficulties for Atari games. We study the sample complexity required by GDM and RP to adapt and transfer from one domain of the game (a mode and difficulty) to another domain (another mode and difficulty).…”

Section: Technical Contributionsmentioning

confidence: 99%

Surprising Negative Results for Generative Adversarial Tree Search

Azizzadenesheli,

Yang,

Liu

et al. 2018

Preprint

View full text Add to dashboard Cite

While many recent advances in deep reinforcement learning rely on model-free methods, model-based approaches remain an alluring prospect for their potential to exploit unsupervised data to learn environment dynamics. One prospect is to pursue hybrid approaches, as in AlphaGo, which combines Monte-Carlo Tree Search (MCTS)-a model-based methodwith deep-Q networks (DQNs)-a model-free method. MCTS requires generating rollouts, which is computationally expensive. In this paper, we propose to simulate roll-outs, exploiting the latest breakthroughs in image-to-image transduction, namely Pix2Pix GANs, to predict the dynamics of the environment. Our proposed algorithm, generative adversarial tree search (GATS), simulates rollouts up to a specified depth using both a GAN-based dynamics model and a reward predictor. GATS employs MCTS for planning over the simulated samples and uses DQN to estimate the Q-function at the leaf states. Our theoretical analysis establishes some favorable properties of GATS vis-a-vis the bias-variance trade-off and empirical results show that on 5 popular Atari games, the dynamics and reward predictors converge quickly to accurate solutions. However, GATS fails to outperform DQNs. Notably, in these experiments, MCTS has only short rollouts (up to tree depth 4), while previous successes of MCTS have involved tree depth in the hundreds. We present a hypothesis for why tree search with short rollouts can fail even given perfect modeling.

show abstract

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

Cited by 15 publications

References 0 publications

Efficient Exploration Through Bayesian Deep Q-Networks

Efficient Exploration Through Bayesian Deep Q-Networks

Assessing Generalization in Deep Reinforcement Learning

Surprising Negative Results for Generative Adversarial Tree Search

Contact Info

Product

Resources

About