Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Liu, Ge; Wu, Rui; Cheng, Heng-Tze; Wang, Jing; Ooi, Jayden; Li, Lihong; Li, Ang; Li, Wai Lok Sibon; Boutilier, Craig; Chi, Ed

doi:10.48550/arxiv.2002.05229

Cited by 1 publication

(1 citation statement)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The difficulty comes from the fact that the state-action space is exponentially large [9], where the optimal policy can not be explored efficiently. A mainstream strategy is to feed numerous trials to RL models in the optimization process to enhance performance [10][11][12][13][14][15][16][17]. Nevertheless, such a sample inefficiency challenges the applicability of RL towards large-scale problems [8], where the requested computational overhead is expensive or even unaffordable.…”

Section: Introductionmentioning

confidence: 99%

Unentangled quantum reinforcement learning agents in the OpenAI Gym

Hsiao¹,

Du²,

Chiang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Classical reinforcement learning (RL) has generated excellent results in different regions ; however, its sample inefficiency remains a critical issue. In this paper, we provide concrete numerical evidence that the sample efficiency (the speed of convergence) of quantum RL could be better than that of classical RL, and for achieving comparable learning performance, quantum RL could use much (at least one order of magnitude) fewer trainable parameters than classical RL. Specifically, we employ the popular benchmarking environments of RL in the OpenAI Gym, and show that our quantum RL agent converges faster than classical fully-connected neural networks (FCNs) in the tasks of CartPole and Acrobot under the same optimization process. We also successfully train the first quantum RL agent that can complete the task of LunarLander in the OpenAI Gym. Our quantum RL agent only requires a single-qubit-based variational quantum circuit without entangling gates, followed by a classical neural network (NN) to post-process the measurement output. Finally, we could accomplish the aforementioned tasks on the real IBM quantum machines. To the best of our knowledge, none of the earlier quantum RL agents could do that.

show abstract