2019
DOI: 10.48550/arxiv.1912.02365
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Lower Bounds for Non-Convex Stochastic Optimization

Abstract: We lower bound the complexity of finding -stationary points (with gradient norm at most ) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least −4 queries to find an stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

9
161
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(170 citation statements)
references
References 22 publications
9
161
0
Order By: Relevance
“…It is the first analysis of stochastic algorithms for NC-PL minimax problems. The dependency on is optimal, because the lower complexity bound of Ω( −4 ) for stochastic nonconvex optimization [Arjevani et al, 2019] still holds when considering f (x, y) = F (x) for some nonconvex function F (x). Even under the strictly stronger assumption of imposing strong-concavity in y, to the best of our knowledge, it is the first time that vanilla stochastic GDA-type algorithm is showed to achieve O( −4 ) sample complexity without either increasing batch size as in [Lin et al, 2020a] or Lipschitz continuity of f (•, y) and its Hessian as in [Chen et al, 2021b].…”
Section: Notationsmentioning
confidence: 99%
“…It is the first analysis of stochastic algorithms for NC-PL minimax problems. The dependency on is optimal, because the lower complexity bound of Ω( −4 ) for stochastic nonconvex optimization [Arjevani et al, 2019] still holds when considering f (x, y) = F (x) for some nonconvex function F (x). Even under the strictly stronger assumption of imposing strong-concavity in y, to the best of our knowledge, it is the first time that vanilla stochastic GDA-type algorithm is showed to achieve O( −4 ) sample complexity without either increasing batch size as in [Lin et al, 2020a] or Lipschitz continuity of f (•, y) and its Hessian as in [Chen et al, 2021b].…”
Section: Notationsmentioning
confidence: 99%
“…In our case, N remains constant since B k benefits from warm-start. The faster rates of MRBO/VRBO (Yang et al, 2021) are obtained under the additional mean-squared smoothness assumption (Arjevani et al, 2019), which we do not investigate in the present work. Such assumption allows to achieve the improved complexity of O( −3/2 log( −1 )).…”
Section: Complexity Analysismentioning
confidence: 74%
“…The dependence on κ L and κ g for TTSA and AccBio are derived in Proposition 11 of Appendix A.4. The rate of MRBO/VRBO is obtained under the additional mean-squared smoothness assumption (Arjevani et al, 2019).…”
Section: General Setting and Main Assumptionsmentioning
confidence: 99%
“…While this rate is optimal in the general case, it is known that one can obtain an improved rate of O(1/T 1/3 ) if the objective is an expectation over smooth losses [Fang et al, 2018, Zhou et al, 2018, Cutkosky and Orabona, 2019, Tran-Dinh et al, 2019. Besides, this rate was recently shown to be tight [Arjevani et al, 2019].…”
Section: Introductionmentioning
confidence: 94%
“…In the context of stochastic non-convex optimization with general smooth losses, it was shown in Ghadimi and Lan [2013] that SGD with an appropriately selected learning rate can obtain a rate of O(1/T 1/4 ) for finding an approximate stationary point, which is known to match the respective lower bound [Arjevani et al, 2019]. While the method of Ghadimi and Lan [2013] requires knowledge of the smoothness and variance parameters, recent works have shown that adaptive methods like AdaGrad are able to obtain this bound in a parameter free manner, as well as to adapt to the variance of the problem [Li and Orabona, 2019, Ward et al, 2019, Reddi et al, 2018.…”
Section: Related Workmentioning
confidence: 99%