Pingli Lv scite author profile

Estimation bias seriously affects the performance of reinforcement learning algorithms. The maximum operation may result in overestimation, while the double estimator operation often leads to underestimation. To eliminate the estimation bias, these two operations are combined together in our proposed algorithm named stochastic double deep Q-learning network (SDDQN), which is based on the idea of random selection. A tabular version of SDDQN is also given, named stochastic double Q-learning (SDQ). Both the SDDQN and SDQ are based on the double estimator framework. At each step, we choose to use either the maximum operation or the double estimator operation with a certain probability, which is determined by a random selection parameter. The theoretical analysis shows that there indeed exists a proper random selection parameter that makes SDDQN and SDQ unbiased. The experiments on Grid World and Atari 2600 games illustrate that our proposed algorithms can balance the estimation bias effectively and improve performance.

show abstract

Notice of Retraction: Integrated Double Estimator Architecture for Reinforcement Learning

Wang

Cheng

et al. 2022

IEEE Trans. Cybern.

View full text Add to dashboard Cite

New upper bounds on the L(2,1)-labeling of the skew and converse skew product graphs

Duan

Miao

et al. 2011

Theoretical Computer Science

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pingli Lv

Optimal Channel Assignment for Wireless Networks Modelled as Hexagonal and Square Grids

Stochastic Double Deep Q-Network

Notice of Retraction: Integrated Double Estimator Architecture for Reinforcement Learning

New upper bounds on the L(2,1)-labeling of the skew and converse skew product graphs

Contact Info

Product

Resources

About