Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Anschel, Oron; Baram, Nir; Shimkin, Nahum

doi:10.48550/arxiv.1611.01929

Cited by 7 publications

(7 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Bu sayede sonraki eylemlerini seçerken en yüksek ödül değerini getirecek eylemi seçer. Sinir ağı güncellenirken Q-fonksiyonu adı verilen bir fonksiyon ile güncellenir [12]. Bu fonksiyon Denklem 1'de verilmiştir.…”

Section: şEkil 2 Markov Karar Süreciunclassified

“…Fakat Q değerlerini barından ve Q-tablo olarak adlandırılan bu tablonun kapasite, maliyet ve her durumu önceden bilerek Q-tablosunu hazırlama gibi sorunları vardır. Bu sorunları ortadan kaldırmak için Q değerlerini tahmin eden, durumları genelleştirerek önceden bilme ve tablo hazırlama zorunluluklarını ortadan kaldıran sinir ağı kullanan DQN algoritması ortaya çıkmıştır [12].…”

Section: Dqn Algoritmasıunclassified

See 1 more Smart Citation

Derin Takviyeli Öğrenme Tabanlı Bilinmeyen Ortamda Çoklu Robot Navigasyonu

Bar

Karaköse

2022

Fırat Üniversitesi Mühendislik Bilimleri Dergisi

View full text Add to dashboard Cite

To solve multi-robot navigation with traditional methods, many algorithms can be used in combination, both for navigation and for the cooperation of these robots. These traditional methods using multiple algorithms are costly. Deep reinforcement learning (DRL) is simpler and less costly when compared to traditional methods. Nowadays, it is tried to solve real-world problems with DRL for these reasons. In this study, it has been tried to solve the multi-robot navigation problem with DRL. In the system in the proposed approach, there is a synchronous environment and more than one robot, target and obstacle in this environment. The robots in the environment move by selecting an action, respectively. At the same time, the robots as a dynamic obstacle for other robots. The robots try to reach their targets in the shortest path without any collision. At the same time, the robots try to plan paths so that they do not collide with another robot or extend the path of another robot. In order to provide these, multi-agent DQN algorithms, target-oriented state data, and reinforced adaptive reward mechanism were used. The system in the proposed approach was evaluated as the navigation success of a single robot, the navigation success of the multi-robot system, and the success rate according to the number of robots per unit square.

show abstract

Section: şEkil 2 Markov Karar Süreciunclassified

Section: Dqn Algoritmasıunclassified

Derin Takviyeli Öğrenme Tabanlı Bilinmeyen Ortamda Çoklu Robot Navigasyonu

Bar

Karaköse

2022

Fırat Üniversitesi Mühendislik Bilimleri Dergisi

View full text Add to dashboard Cite

show abstract

“…The improvement of AFBC over standard BC rests on our ability to accurately estimate the advantage of (s, a) pairs, which makes the method vulnerable to bias in the critic learning process. Critic overestimation and error propagation is a thoroughly investigated problem in recent work [13,28,26,3,23,2]. REDQ [7] trains an ensemble of critic networks on target values generated by a random subset of that ensemble and provides an effective bias-variance trade-off.…”

Section: Alternative Binary Filters Enhanced Critic Updates and Futur...mentioning

confidence: 99%

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Grigsby,

2021

Preprint

View full text Add to dashboard Cite

Recent Offline Reinforcement Learning methods have succeeded in learning highperformance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65 : 1.

show abstract

“…4) DQN Variance: The sources of DQN variance are Approximation Gradient Error(AGE) [23] and Target Approximation Error(TAE) [24]. In Approximation Gradient Error, the error in gradient direction estimation of the cost function leads to inaccurate and extremely different predictions on the learning trajectory through different episodes because of the unseen state transitions and the finite size of experience reply buffer.…”

Section: B Reinforcement Learningmentioning

confidence: 99%

On the Reduction of Variance and Overestimation of Deep Q-Learning

Sabry,

Khalifa

2019

Preprint

View full text Add to dashboard Cite

The breakthrough of deep Q-Learning on different types of environments revolutionized the algorithmic design of Reinforcement Learning to introduce more stable and robust algorithms, to that end many extensions to deep Q-Learning algorithm have been proposed to reduce the variance of the target values and the overestimation phenomena. In this paper, we examine new methodology to solve these issues, we propose using Dropout techniques on deep Q-Learning algorithm as a way to reduce variance and overestimation. We further present experiments on some of the benchmark environments that demonstrate significant improvement of the stability of the performance and a reduction in variance and overestimation.

show abstract

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Cited by 7 publications

References 21 publications

Derin Takviyeli Öğrenme Tabanlı Bilinmeyen Ortamda Çoklu Robot Navigasyonu

Derin Takviyeli Öğrenme Tabanlı Bilinmeyen Ortamda Çoklu Robot Navigasyonu

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

On the Reduction of Variance and Overestimation of Deep Q-Learning

Contact Info

Product

Resources

About