2020 Third International Conference on Artificial Intelligence for Industries (AI4I) 2020
DOI: 10.1109/ai4i49448.2020.00014
|View full text |Cite
|
Sign up to set email alerts
|

Deep Reinforcement Learning using Cyclical Learning Rates

Abstract: Deep Reinforcement Learning (DRL) methods often rely on the meticulous tuning of hyperparameters to successfully resolve problems. One of the most influential parameters in optimization procedures based on stochastic gradient descent (SGD) is the learning rate. We investigate cyclical learning and propose a method for defining a general cyclical learning rate for various DRL problems. In this paper we present a method for cyclical learning applied to complex DRL problems. Our experiments show that, utilizing c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…In deep reinforcement learning, the single-step average reward value of each episode is an important indicator to measure the training effect [ 30 , 31 , 32 , 33 ]. This paper counts the average single-step rewards of [ 22 ] and DCPER-DDPG algorithm in 6000 episodes.…”
Section: Results Analysismentioning
confidence: 99%
“…In deep reinforcement learning, the single-step average reward value of each episode is an important indicator to measure the training effect [ 30 , 31 , 32 , 33 ]. This paper counts the average single-step rewards of [ 22 ] and DCPER-DDPG algorithm in 6000 episodes.…”
Section: Results Analysismentioning
confidence: 99%
“…Thus, having an adaptive Discount Factor could lead to better learning performance and adapt itself when results (in our case, QoE value) are wrong. In the same way as the discount factor, with an adaptive learning rate such as a cyclical learning rate, which consists of varying the learning rate cyclically between two boundary values [20], there would be no need to find the best values and schedule for the global learning rates, so accuracy could be improved in fewer iterations.…”
Section: Discussionmentioning
confidence: 99%
“…In DRL, it is necessary to provide the agent with a set of optimal hyperparameters to improve the performance and effect of learning [24].…”
Section: The Selection Of Hyperparametermentioning
confidence: 99%