2016
DOI: 10.1007/978-3-319-46128-1_50
|View full text |Cite
|
Sign up to set email alerts
|

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

Abstract: Abstract. In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-Lojasiewicz (PL) inequality is actually weaker than the main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

20
794
0
8

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 736 publications
(822 citation statements)
references
References 46 publications
20
794
0
8
Order By: Relevance
“…Intuitively, this inequality means that the suboptimality of iterates measured by function values can be bounded by gradient norms. PL condition is also referred to as gradient dominated condition in the literature [4], and widely adopted in the analysis in both the convex and nonconvex optimization setting [1,7,8]. Examples of functions satisfying PL condition include neural networks with one-hidden layers, ResNets with linear activation and objective functions in matrix factorization [8].…”
Section: B Objective Functions With Polyak-łojasiewicz Inequalitymentioning
confidence: 99%
See 1 more Smart Citation
“…Intuitively, this inequality means that the suboptimality of iterates measured by function values can be bounded by gradient norms. PL condition is also referred to as gradient dominated condition in the literature [4], and widely adopted in the analysis in both the convex and nonconvex optimization setting [1,7,8]. Examples of functions satisfying PL condition include neural networks with one-hidden layers, ResNets with linear activation and objective functions in matrix factorization [8].…”
Section: B Objective Functions With Polyak-łojasiewicz Inequalitymentioning
confidence: 99%
“…As a further step, we consider objective functions satisfying a Polyak-Łojasiewicz (PL) condition which is widely adopted in the literature of nonconvex optimization. In this case, we derive convergence rates O(1/t) for SGD with t iterations, which also remove the boundedness assumption on gradients imposed in [1] to derive similar convergence rates. We introduce a zero-variance condition which allows us to derive linear convergence of SGD.…”
Section: Introductionmentioning
confidence: 99%
“…Another notion of stationarity, which is used in this paper (as well as other works including [37]), is defined as follows.…”
Section: Discussion On Remark 1: Consider the Optimization Problemmentioning
confidence: 99%
“…Put C = δ 2 − δ 4 4 > 0. Using notation y for the unit vector y = 1 √ C [α 2 , α 3 , ..., α n ] T ∈ R n−1 (see (19)), and B for the diagonal matrix with strictly positive diagonal elements…”
Section: Gradient Projection -Newton Methods (Gpa3)mentioning
confidence: 99%