2022
DOI: 10.48550/arxiv.2203.13273
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range

Abstract: Adam and AdaBelief compute and make use of elementwise adaptive stepsizes in training deep neural networks (DNNs) by tracking the exponential moving average (EMA) of the squared-gradient g 2 t and the squared prediction error (mt −gt) 2 , respectively, where mt is the first momentum at iteration t and can be viewed as a prediction of gt. In this work, we attempt to find out if layerwise gradient statistics can be expoited in Adam and AdaBelief to allow for more effective training of DNNs. We address the above … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 12 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?