Behaviour Suite for Reinforcement Learning

Osband, Ian; Doron, Yotam; Hessel, Matteo; Aslanides, John; Sezener, Eren; Saraiva, André; McKinney, Katrina; Lattimore, Tor; Szepesvári, Csaba; Singh, Satinder; Roy, Benjamin Van; Sutton, Richard; Silver, David; Hasselt, Hado van

doi:10.48550/arxiv.1908.03568

Cited by 26 publications

(35 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are a substantial amount of meta analysis works on online RL algorithms. While some focus on inadequacies in the experimental protocols [Henderson et al, 2017, Osband et al, 2019, others study the roles of subtle implementation details in algorithms [Tucker et al, 2018, Engstrom et al, 2020, Andrychowicz et al, 2021, Furuta et al, 2021. For example, Tucker et al [2018], Engstrom et al [2020] identified that superior performances of certain algorithms were more dependent on, or even accidentally due to, minor implementation rather than algorithmic differences.…”

Section: Meta Analyses Of Rl Algorithmsmentioning

confidence: 99%

A Minimalist Approach to Offline Reinforcement Learning

Fujimoto¹,

Gu²

2021

Preprint

View full text Add to dashboard Cite

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing the policy with the actions contained in the dataset. Built on pre-existing RL algorithms, modifications to make an RL algorithm work offline comes at the cost of additional complexity. Offline RL algorithms introduce new hyperparameters and often leverage secondary components such as generative models, while adjusting the underlying RL algorithm. In this paper we aim to make a deep RL algorithm work while making minimal changes. We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data. The resulting algorithm is a simple to implement and tune baseline, while more than halving the overall run time by removing the additional computational overheads of previous methods.Preprint. Under review.

show abstract

Section: Meta Analyses Of Rl Algorithmsmentioning

confidence: 99%

A Minimalist Approach to Offline Reinforcement Learning

Fujimoto¹,

Gu²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We evaluate our approach on the open-source RL Unplugged Atari dataset , where we show that R-BVE outperforms other offline RL methods. We show that R-BVE performs better on two more datasets: bsuite (Osband et al, 2019) and partially observable DeepMind Lab environments (Beattie et al, 2016) 1 . We provide careful ablations and analyses that provide insights into our proposed method and existing offline RL algorithms.…”

Section: Introductionmentioning

confidence: 80%

“…bsuite (Osband et al, 2019) is a proposed benchmark designed to highlight key aspects of an agent's scalability such as exploration, memory or credit assignment. We generated low-coverage offline RL datasets for catch, mountain_car and cartpole by recording the experiences of an online agent during training, as described by (Agarwal et al, 2019a), and then subsampling it (see Appendix D.1 for details.)…”

Section: Bsuite Experimentsmentioning

confidence: 99%

Regularized Behavior Value Estimation

Gülçehre¹,

Colmenarejo²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Offline reinforcement learning restricts the learning process to rely only on logged-data without access to an environment. While this enables real-world applications, it also poses unique challenges. One important challenge is dealing with errors caused by the over-estimation of values for state-action pairs not well-covered by the training data. Due to bootstrapping, these errors get amplified during training and can lead to divergence, thereby crippling learning. To overcome this challenge, we introduce Regularized Behavior Value Estimation (R-BVE). Unlike most approaches, which use policy improvement during training, R-BVE estimates the value of the behavior policy during training and only performs policy improvement at deployment time. Further, R-BVE uses a ranking regularisation term that favours actions in the dataset that lead to successful outcomes. We provide ample empirical evidence of R-BVE's effectiveness, including state-of-theart performance on the RL Unplugged ATARI dataset. We also test R-BVE on new datasets, from bsuite and a challenging DeepMind Lab task, and show that R-BVE outperforms other state-ofthe-art discrete control offline RL methods.

show abstract

“…From OpenAI Gym (Brockman et al, 2016), LunarLander is a sparse reward control environment, and MountainCar a sparse reward exploration environment. From BSuite 6 (Osband et al, 2020), Cartpole-Noise is a dense reward control environment. The BootstrapDQN baseline follows Osband et al (2018) as explained in section 2.3.…”

Section: Iv-dqnmentioning

confidence: 99%

Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

Mai¹,

Mani²,

Paull³

2022

Preprint

View full text Add to dashboard Cite

In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. As this noise is heteroscedastic, its effects can be mitigated using uncertainty-based weights in the optimization process. Previous methods rely on sampled ensembles, which do not capture all aspects of uncertainty. We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL, and introduce inverse-variance RL, a Bayesian framework which combines probabilistic ensembles and Batch Inverse Variance weighting. We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environment stochasticity to better mitigate the negative impacts of noisy supervision. Our results show significant improvement in terms of sample efficiency on discrete and continuous control tasks.

show abstract

Behaviour Suite for Reinforcement Learning

Cited by 26 publications

References 26 publications

A Minimalist Approach to Offline Reinforcement Learning

A Minimalist Approach to Offline Reinforcement Learning

Regularized Behavior Value Estimation

Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

Contact Info

Product

Resources

About