2021
DOI: 10.48550/arxiv.2110.03677
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

Abstract: Recent empirical advances show that training deep models with large learning rate often improves generalization performance. However, theoretical justifications on the benefits of large learning rate are highly limited, due to challenges in analysis. In this paper, we consider using Gradient Descent (GD) with a large learning rate on a homogeneous matrix factorization problem, i.e., minX,Y A − XY 2 F . We prove a convergence theory for constant large learning rates well beyond 2/L, where L is the largest eigen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 34 publications
0
2
0
Order By: Relevance
“…Let A be the set of those x 0 such that starting from these x 0 the GD will arrive at 0 after some steps. Because in the current case 0 is an unstable stationary point, it is easy to show that A contains countable number of points and hence has zero Lebesgue measure [20]. We ignore the detailed proof here.…”
Section: A One-dimensional Analysismentioning
confidence: 93%
“…Let A be the set of those x 0 such that starting from these x 0 the GD will arrive at 0 after some steps. Because in the current case 0 is an unstable stationary point, it is easy to show that A contains countable number of points and hence has zero Lebesgue measure [20]. We ignore the detailed proof here.…”
Section: A One-dimensional Analysismentioning
confidence: 93%
“…We let A contain those x 0 that starting from these x 0 the GD will arrive at 0 after some steps. Because in the current case 0 is an unstable stationary point, it is easy to show that A contains countable number of points and hence has zero Lebesgue measure [18]. We ignore the detailed proof here.…”
Section: An 1-d Analysismentioning
confidence: 93%