2021
DOI: 10.48550/arxiv.2110.09057
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization

Abstract: Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 31 publications
1
13
0
Order By: Relevance
“…And 3) HBNODEs can learn long-term dependencies effectively, capturing intrinsic patterns from data. There are numerous avenues for future works, and two particular interesting directions in our mind are 1) Improving HBNODEs, particularly replacing the fine-tuned or learned damping parameter with an adaptive one that are motivated by certain optimization algorithms with adaptive momentum [55,54,57], and 2) Applying HBNODE-based ROMs to model reduction arising from scientific challenges, especially when we do not have the ground truth governing equation of the dynamical systems.…”
Section: Discussionmentioning
confidence: 99%
“…And 3) HBNODEs can learn long-term dependencies effectively, capturing intrinsic patterns from data. There are numerous avenues for future works, and two particular interesting directions in our mind are 1) Improving HBNODEs, particularly replacing the fine-tuned or learned damping parameter with an adaptive one that are motivated by certain optimization algorithms with adaptive momentum [55,54,57], and 2) Applying HBNODE-based ROMs to model reduction arising from scientific challenges, especially when we do not have the ground truth governing equation of the dynamical systems.…”
Section: Discussionmentioning
confidence: 99%
“…These optimal hyper-parameters require the knowledge of the Lipschitz constant L, and μ, which are generally inaccessible. Since the SHB method has produced a great practical result, it has been studied by many researchers in both convex and nonconvex situations [30,47,46,44,45,42]. Besides, SHB can escape the saddle point with a larger learning rate [43] and successfully improve the training speed and accuracy in those image classification missions with Deep-Neural-Network (DNN) [6,18,48].…”
Section: Stochastic Heavy Ball and Adaptive Momentummentioning
confidence: 99%
“…Finding the optimal hyperparameters and computing them directly before the training begins is difficult and computational. To this end, an adaptive method is developed for the momentum for SHB, which uses the historical information [44]. To the best of our knowledge, there is no principled way to fine tune an optimization method, it is thus natural to raise the problem: Can we establish an easily method for tuning the normalized SHB method and guarantee its convergence?…”
Section: Introductionmentioning
confidence: 99%
“…The learning rate plays an important role in neural network training. 32,44,45 If it is too large, the training neural network is difficult to converge; if it is too small, the training neural network will converge slowly. Up to now, several works about learning rate strategies in neural network training have appeared, which can be summarized into the following categories.…”
Section: Related Workmentioning
confidence: 99%