2018
DOI: 10.48550/arxiv.1803.05591
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the insufficiency of existing momentum schemes for Stochastic Optimization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…An important and challenging future direction is to analyze the convergence of adaptive gradient methods including our algorithm for nonconvex functions. Since our algorithm can also be seen as an interpolation between Amsgrad and SGD with momentum, we can borrow some idea from the convergence analysis of SGD with momentum (Kidambi et al, 2018) to facilitate the analysis in this direction. It would also be interesting to see how well Padam performs in recurrent neural networks (RNNs) (Hochreiter and Schmidhuber, 1997) and generative adversarial networks (GANs) (Goodfellow et al, 2014).…”
Section: Discussionmentioning
confidence: 99%
“…An important and challenging future direction is to analyze the convergence of adaptive gradient methods including our algorithm for nonconvex functions. Since our algorithm can also be seen as an interpolation between Amsgrad and SGD with momentum, we can borrow some idea from the convergence analysis of SGD with momentum (Kidambi et al, 2018) to facilitate the analysis in this direction. It would also be interesting to see how well Padam performs in recurrent neural networks (RNNs) (Hochreiter and Schmidhuber, 1997) and generative adversarial networks (GANs) (Goodfellow et al, 2014).…”
Section: Discussionmentioning
confidence: 99%
“…The convergence of momentum methods has been studied extensively, both theoretically and empirically (Wibisono & Wilson, 2015;Wibisono et al, 2016;Kidambi et al, 2018). By analyzing the failure modes of existing methods these works motivate successful momentum schemes.…”
Section: Related Workmentioning
confidence: 99%
“…In Adam and many other exp-avg methods, the effective stepsize η k is controlled by the scaling term h k with η k ∝ 1/h k . However, using exponential average (9) we are unable to guarantee that stepsize will be diminishing stepsize.…”
Section: Exponential Moving Averagementioning
confidence: 99%
“…[25] demonstrated the importance of momentum and advantage of Nesterov's accelerated gradient method in training deep neural nets. Recent works [7,9] propose more robust accelerated Sgd with improved statistical error. In addition, many studies investigate convergence issue for nonconvex optimization, including the theoretical concern on the convergence to saddle point and how to escape from that.…”
Section: Introductionmentioning
confidence: 99%