2019
DOI: 10.48550/arxiv.1910.05983
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Reduction of Variance and Overestimation of Deep Q-Learning

Mohammed Sabry,
Amr M. A. Khalifa

Abstract: The breakthrough of deep Q-Learning on different types of environments revolutionized the algorithmic design of Reinforcement Learning to introduce more stable and robust algorithms, to that end many extensions to deep Q-Learning algorithm have been proposed to reduce the variance of the target values and the overestimation phenomena. In this paper, we examine new methodology to solve these issues, we propose using Dropout techniques on deep Q-Learning algorithm as a way to reduce variance and overestimation. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 15 publications
0
7
0
Order By: Relevance
“…More recently, proposed an optimization strategy by combining the stochastic variance reduced gradient (SVRG) (Johnson and Zhang 2013) technique and the deep Q-learning, called SVR-DQN. More methods on variance reduction for deep Q-learning please refer to (Romoff et al 2018;Sabry and Khalifa 2019). SVRG has also been applied to policy gradient methods in RL as an effective variance-reduced technique for stochastic optimization, such as off-line control (Xu, Liu, and Peng 2017), policy evaluation (Du et al 2017), and on-policy control (Papini et al 2018).…”
Section: Introductionmentioning
confidence: 99%
“…More recently, proposed an optimization strategy by combining the stochastic variance reduced gradient (SVRG) (Johnson and Zhang 2013) technique and the deep Q-learning, called SVR-DQN. More methods on variance reduction for deep Q-learning please refer to (Romoff et al 2018;Sabry and Khalifa 2019). SVRG has also been applied to policy gradient methods in RL as an effective variance-reduced technique for stochastic optimization, such as off-line control (Xu, Liu, and Peng 2017), policy evaluation (Du et al 2017), and on-policy control (Papini et al 2018).…”
Section: Introductionmentioning
confidence: 99%
“…DQN algorithm can provide optimal decision-making with minimal observations in relatively small and simple IoD environments. However, the DQN algorithm suffers from the overestimation issue, which causes a positive bias in reward calculation leading to sub-optimal policies [30].…”
Section: B Overview Of Drl Techniques For Iodmentioning
confidence: 99%
“…How to estimate the value function in a good way remains an ongoing problem in RL, and has been widely investigated in deep Q-network (DQN) (Hasselt, Guez, and Silver 2016;Sabry and Khalifa 2019) in discrete regime control. Lan et al (Lan et al 2020) propose to take the minimum Q-value under the ensemble scheme to control the estimation bias in DQN, while Anschel et al (Anschel, Baram, and Shimkin 2017) leverage the average value of an ensemble of Q-networks for variance reduction.…”
Section: Related Workmentioning
confidence: 99%