2016
DOI: 10.48550/arxiv.1611.01929
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Abstract: Instability and variability of Deep ReinforcementLearning (DRL) algorithms tend to adversely affect their performance. Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values. To understand the effect of the algorithm, we examine the source of value function estimation errors and provide an analytical comparison within … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 21 publications
0
5
0
2
Order By: Relevance
“…Bu sayede sonraki eylemlerini seçerken en yüksek ödül değerini getirecek eylemi seçer. Sinir ağı güncellenirken Q-fonksiyonu adı verilen bir fonksiyon ile güncellenir [12]. Bu fonksiyon Denklem 1'de verilmiştir.…”
Section: şEkil 2 Markov Karar Süreciunclassified
See 1 more Smart Citation
“…Bu sayede sonraki eylemlerini seçerken en yüksek ödül değerini getirecek eylemi seçer. Sinir ağı güncellenirken Q-fonksiyonu adı verilen bir fonksiyon ile güncellenir [12]. Bu fonksiyon Denklem 1'de verilmiştir.…”
Section: şEkil 2 Markov Karar Süreciunclassified
“…Fakat Q değerlerini barından ve Q-tablo olarak adlandırılan bu tablonun kapasite, maliyet ve her durumu önceden bilerek Q-tablosunu hazırlama gibi sorunları vardır. Bu sorunları ortadan kaldırmak için Q değerlerini tahmin eden, durumları genelleştirerek önceden bilme ve tablo hazırlama zorunluluklarını ortadan kaldıran sinir ağı kullanan DQN algoritması ortaya çıkmıştır [12].…”
Section: Dqn Algoritmasıunclassified
“…The improvement of AFBC over standard BC rests on our ability to accurately estimate the advantage of (s, a) pairs, which makes the method vulnerable to bias in the critic learning process. Critic overestimation and error propagation is a thoroughly investigated problem in recent work [13,28,26,3,23,2]. REDQ [7] trains an ensemble of critic networks on target values generated by a random subset of that ensemble and provides an effective bias-variance trade-off.…”
Section: Alternative Binary Filters Enhanced Critic Updates and Futur...mentioning
confidence: 99%
“…4) DQN Variance: The sources of DQN variance are Approximation Gradient Error(AGE) [23] and Target Approximation Error(TAE) [24]. In Approximation Gradient Error, the error in gradient direction estimation of the cost function leads to inaccurate and extremely different predictions on the learning trajectory through different episodes because of the unseen state transitions and the finite size of experience reply buffer.…”
Section: B Reinforcement Learningmentioning
confidence: 99%