2020
DOI: 10.48550/arxiv.2007.12817
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient

Abstract: Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance, resulting in unstable training and poor sampling efficiency. Stochastic variancereduced gradient methods such as SVRG have been applied to reduce the estimation variance . However, due to the online instance generation nature of reinforcement learning, directly applying SVRG to deep Q-learning is facing the problem of the inaccurate estimation of the anchor points, which dramatically limits the potentials of SVRG… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

1
0
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 11 publications
1
0
0
Order By: Relevance
“…Multiple training attempts with the same beginning conditions will invariably result in somewhat different trading strategies with various results. This is also observed in the research study of Thibaut et al [68] and this is also discussed by Jia et al [69]. Hence it is necessary to validate the model with multiple trials and report the average.…”
Section: Stock Name Portfolio Valuesupporting
confidence: 62%
“…Multiple training attempts with the same beginning conditions will invariably result in somewhat different trading strategies with various results. This is also observed in the research study of Thibaut et al [68] and this is also discussed by Jia et al [69]. Hence it is necessary to validate the model with multiple trials and report the average.…”
Section: Stock Name Portfolio Valuesupporting
confidence: 62%