On the Reduction of Variance and Overestimation of Deep Q-Learning

Sabry, Mohammed; Khalifa, Amr M. A.

doi:10.48550/arxiv.1910.05983

Cited by 7 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More recently, proposed an optimization strategy by combining the stochastic variance reduced gradient (SVRG) (Johnson and Zhang 2013) technique and the deep Q-learning, called SVR-DQN. More methods on variance reduction for deep Q-learning please refer to (Romoff et al 2018;Sabry and Khalifa 2019). SVRG has also been applied to policy gradient methods in RL as an effective variance-reduced technique for stochastic optimization, such as off-line control (Xu, Liu, and Peng 2017), policy evaluation (Du et al 2017), and on-policy control (Papini et al 2018).…”

Section: Introductionmentioning

confidence: 99%

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient

Jia¹,

Zhang²,

Xu³

et al. 2020

Preprint

View full text Add to dashboard Cite

Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance, resulting in unstable training and poor sampling efficiency. Stochastic variancereduced gradient methods such as SVRG have been applied to reduce the estimation variance . However, due to the online instance generation nature of reinforcement learning, directly applying SVRG to deep Q-learning is facing the problem of the inaccurate estimation of the anchor points, which dramatically limits the potentials of SVRG. To address this issue and inspired by the recursive gradient variance reduction algorithm SARAH (Nguyen et al. 2017), this paper proposes to introduce the recursive framework for updating the stochastic gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN. Unlike the SVRG-based algorithms, SRG-DQN designs a recursive update of the stochastic gradient estimate. The parameter update is along an accumulated direction using the past stochastic gradient information, and therefore can get rid of the estimation of the full gradients as the anchors. Additionally, SRG-DQN involves the Adam process for further accelerating the training process. Theoretical analysis and the experimental results on well-known reinforcement learning tasks demonstrate the efficiency and effectiveness of the proposed SRG-DQN algorithm.

show abstract

Section: Introductionmentioning

confidence: 99%

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient

Jia¹,

Zhang²,

Xu³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…DQN algorithm can provide optimal decision-making with minimal observations in relatively small and simple IoD environments. However, the DQN algorithm suffers from the overestimation issue, which causes a positive bias in reward calculation leading to sub-optimal policies [30].…”

Section: B Overview Of Drl Techniques For Iodmentioning

confidence: 99%

Deep Reinforcement Learning for Internet of Drones Networks: Issues and Research Directions

Aboueleneen

Alwarafy

Abdallah

2023

IEEE Open J. Commun. Soc.

View full text Add to dashboard Cite

Internet of Drones (IoD) is one of the promising technologies to enhance the performance of wireless networks. Deploying IoD to assist wireless networks, however, needs to address various design issues. Due to the highly dynamic nature of IoD networks, conventional methods are expected to encounter inadequacies that can be resolved using emerging deep reinforcement learning (DRL) techniques. In this paper, we discuss the application of DRL for addressing various issues in IoD networks. We first overview the main features, types, applications, and services of IoD networks. Then, we briefly discuss some DRL algorithms used to address the issues and challenges of IoD networks. After that, we explain the most crucial issues in IoD networks and discuss some papers that show how DRL can address them. Finally, we provide insights into some promising research directions in the context of using DRL in IoD networks.

show abstract

“…How to estimate the value function in a good way remains an ongoing problem in RL, and has been widely investigated in deep Q-network (DQN) (Hasselt, Guez, and Silver 2016;Sabry and Khalifa 2019) in discrete regime control. Lan et al (Lan et al 2020) propose to take the minimum Q-value under the ensemble scheme to control the estimation bias in DQN, while Anschel et al (Anschel, Baram, and Shimkin 2017) leverage the average value of an ensemble of Q-networks for variance reduction.…”

Section: Related Workmentioning

confidence: 99%

Efficient Continuous Control with Double Actors and Regularized Critics

Lyu

Yan

et al. 2022

AAAI

View full text Add to dashboard Cite

How to obtain good value estimation is a critical problem in Reinforcement Learning (RL). Current value estimation methods in continuous control, such as DDPG and TD3, suffer from unnecessary over- or under- estimation. In this paper, we explore the potential of double actors, which has been neglected for a long time, for better value estimation in the continuous setting. First, we interestingly find that double actors improve the exploration ability of the agent. Next, we uncover the bias alleviation property of double actors in handling overestimation with single critic, and underestimation with double critics respectively. Finally, to mitigate the potentially pessimistic value estimate in double critics, we propose to regularize the critics under double actors architecture. Together, we present Double Actors Regularized Critics (DARC) algorithm. Extensive experiments on challenging continuous control benchmarks, MuJoCo and PyBullet, show that DARC significantly outperforms current baselines with higher average return and better sample efficiency.

show abstract

On the Reduction of Variance and Overestimation of Deep Q-Learning

Cited by 7 publications

References 15 publications

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient

Deep Reinforcement Learning for Internet of Drones Networks: Issues and Research Directions

Efficient Continuous Control with Double Actors and Regularized Critics

Contact Info

Product

Resources

About