2018
DOI: 10.48550/arxiv.1810.06801
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Quasi-hyperbolic momentum and Adam for deep learning

Abstract: Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our alg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(29 citation statements)
references
References 22 publications
0
29
0
Order By: Relevance
“…Indeed, along with HB and NAG, the QHM method can also be seen as a numerical integrator on GM-ODE. QHM was shown to be very competitive in deep learning tasks (Choi et al, 2019) as well as in the strongly-convex setting (see Appendix J in (Ma and Yarats, 2018)). However, to the best of our knowledge, QHM has only been studied in the quadratic case (Gitman et al, 2019) (hence the novelty of our rate).…”
Section: Summary Of the Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Indeed, along with HB and NAG, the QHM method can also be seen as a numerical integrator on GM-ODE. QHM was shown to be very competitive in deep learning tasks (Choi et al, 2019) as well as in the strongly-convex setting (see Appendix J in (Ma and Yarats, 2018)). However, to the best of our knowledge, QHM has only been studied in the quadratic case (Gitman et al, 2019) (hence the novelty of our rate).…”
Section: Summary Of the Resultsmentioning
confidence: 99%
“…3.2 in Khalil and Grizzle ( 2002)). The model above is inspired by the quasi-hyperbolic momentum (QHM) algorithm 9 developed in Ma and Yarats (2018). We discuss the connection to QHM later in Sec.…”
Section: Continuous-time Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…The most basic improvements of gradient descent are momentum and Nesterov acceleration. There is a large body of current research either analyzing or suggesting modifications to (non-adaptive) momentum-based methods (Wibisono and Wilson, 2015;Wibisono et al, 2016;Yuan et al, 2016;Jin et al, 2017;Lucas et al, 2018;Ma and Yarats, 2018;Cyrus et al, 2018;Srinivasan et al, 2018;Kovachki and Stuart, 2019;Chen and Kyrillidis, 2019;Gitman et al, 2019).…”
Section: Discussion Context and Recommendationsmentioning
confidence: 99%