2021
DOI: 10.48550/arxiv.2106.02888
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Xiaoyu Wang,
Mikael Johansson

Abstract: Many popular learning-rate schedules for deep neural networks combine a decaying trend with local perturbations that attempt to escape saddle points and bad local minima. We derive convergence guarantees for bandwidth-based step-sizes, a general class of learning-rates that are allowed to vary in a banded region. This framework includes cyclic and non-monotonic step-sizes for which no theoretical guarantees were previously known. We provide worst-case guarantees for SGD on smooth non-convex problems under seve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 24 publications
0
6
0
Order By: Relevance
“…• We consider a broad class of step-sizes that are only required to be smaller than a constant O(1/L). Our results cover many of the most popular step-size policies, including the classical constant and polynomial decay step-sizes, as well as more recently proposed time-dependent step-sizes such as stage-wise decay [Li et al, 2021, Wang et al, 2021, cosine with or without restart [Loshchilov and Hutter, 2017], and bandwidth-based step-sizes [Wang andJohansson, 2021, Wang andYuan, 2021].…”
Section: Contributionsmentioning
confidence: 78%
See 4 more Smart Citations
“…• We consider a broad class of step-sizes that are only required to be smaller than a constant O(1/L). Our results cover many of the most popular step-size policies, including the classical constant and polynomial decay step-sizes, as well as more recently proposed time-dependent step-sizes such as stage-wise decay [Li et al, 2021, Wang et al, 2021, cosine with or without restart [Loshchilov and Hutter, 2017], and bandwidth-based step-sizes [Wang andJohansson, 2021, Wang andYuan, 2021].…”
Section: Contributionsmentioning
confidence: 78%
“…Theorem 3.1 guarantees uniform boundedness of the SGD iterates for any (possibly non-monotonic) step-size upper bounded by θ1 (ρ+1)L 2 . This includes common step-size policies such as constant, polynomial decay [Moulines and Bach, 2011] step-decay [Wang et al, 2021], and exponential decay [Li et al, 2021], as well as more recently proposed non-monotonic step-sizes such as the bandwidth-based [Wang andYuan, 2021, Wang andJohansson, 2021] and cosine [Loshchilov and Hutter, 2017] step-sizes.…”
Section: Uniform Boundedness Properties Of Sgdmentioning
confidence: 99%
See 3 more Smart Citations