Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract)

Machado, Marlos C.; Bellemare, Marc G.; Talvitie, Erik; Veness, Joel; Hausknecht, Matthew; Bowling, Michael

doi:10.24963/ijcai.2018/787

“…A large body of works has been built on these algorithms to address different challenges in reinforcement learning including policy learning (Haarnoja et al 2017), hierarchical learning (Klissarov et al 2017), transfer learning (Wulfmeier, Posner & Abbeel 2017), and emergence of complex behavior (Heess et al 2017). Deep learning software such as Theano and Tensorflow as well as the availability of source code of learning algorithms (e.g., Duan et al 2016) and benchmark simulated environments (e.g., Brockman et al 2016, Machado et al 2018, Tassa et al 2018 contributed to this advancement.…”

Section: Introductionmentioning

confidence: 99%

Setting up a Reinforcement Learning Task with a Real-World Robot

Mahmood¹,

Korenkevych²,

Komer³

et al. 2018

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Through many recent successes in simulation, model-free reinforcement learning has emerged as a promising approach to solving continuous control robotic tasks. The research community is now able to reproduce, analyze and build quickly on these results due to open source implementations of learning algorithms and simulated benchmark tasks. To carry forward these successes to real-world applications, it is crucial to withhold utilizing the unique advantages of simulations that do not transfer to the real world and experiment directly with physical robots. However, reinforcement learning research with physical robots faces substantial resistance due to the lack of benchmark tasks and supporting source code. In this work, we introduce several reinforcement learning tasks with multiple commercially available robots that present varying levels of learning difficulty, setup, and repeatability. On these tasks, we test the learning performance of off-the-shelf implementations of four reinforcement learning algorithms and analyze sensitivity to their hyper-parameters to determine their readiness for applications in various real-world tasks. Our results show that with a careful setup of the task interface and computations, some of these implementations can be readily applicable to physical robots. We find that state-of-the-art learning algorithms are highly sensitive to their hyper-parameters and their relative ordering does not transfer across tasks, indicating the necessity of re-tuning them for each task for best performance. On the other hand, the best hyper-parameter configuration from one task may often result in effective learning on held-out tasks even with different robots, providing a reasonable default. We make the benchmark tasks publicly available to enhance reproducibility in real-world reinforcement learning 1 .

show abstract

“…To improve the Q-value function we parameterize it by a neural network and update it as the agent collects new experience. Specifically, we evaluate the Q-values on a voxelated grid [35], which allows us to update the Q-value towards the highest final reward observed by the agent for a specific state-action pair (figure 1), which for a deterministic problem puts a lower bound on the optimal Q-value [44]. Additionally, this discretization allows for easy inference of the optimal atom placement by simply finding the voxel which maximizes the Q-value.…”

Section: Theorymentioning

confidence: 99%

Generating stable molecules using imitation and reinforcement learning

Meldgaard

¹

,

Köhler

²

,

Mortensen

³

et al. 2021

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.

show abstract

“…Specifically, we evaluate the Qvalues on a voxelated grid, which allows us to update the Q-value towards the highest final reward observed by the agent for a specific state-action pair (Fig. 1), which for a deterministic problem puts a lower bound on the optimal Q-value 41 . Additionally, this discretization allows for easy inference of the optimal atom placement by simply finding the voxel which maximizes the Q-value.…”

Section: Theorymentioning

confidence: 99%

Generating stable molecules using imitation and reinforcement learning

Meldgaard¹,

Köhler²,

Mortensen³

et al. 2021

Preprint

View full text Add to dashboard Cite

Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.

show abstract

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract)

Cited by 6 publications

References 6 publications

Setting up a Reinforcement Learning Task with a Real-World Robot

Setting up a Reinforcement Learning Task with a Real-World Robot

Generating stable molecules using imitation and reinforcement learning

Generating stable molecules using imitation and reinforcement learning

Contact Info

Product

Resources

About