2015
DOI: 10.48550/arxiv.1503.02101
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition

Abstract: We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for nonconvex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points. In this paper we identify strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
45
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 173 publications
(48 citation statements)
references
References 20 publications
3
45
0
Order By: Relevance
“…Combined with the results in [GHJY15,LSJR16] (see Theorem 2.3) 4 , we have, Theorem 1.2 (Informal). With high probability, stochastic gradient descent on the regularized objective (1.2) will converge to a solution X such that XX T = ZZ T = M in polynomial time from any starting point.…”
Section: Resultssupporting
confidence: 65%
See 1 more Smart Citation
“…Combined with the results in [GHJY15,LSJR16] (see Theorem 2.3) 4 , we have, Theorem 1.2 (Informal). With high probability, stochastic gradient descent on the regularized objective (1.2) will converge to a solution X such that XX T = ZZ T = M in polynomial time from any starting point.…”
Section: Resultssupporting
confidence: 65%
“…Our characterization of the structure in the objective function implies that (stochastic) gradient descent from arbitrary starting point converge to a global minimum. This is because gradient descent converges to a local minimum [GHJY15,LSJR16], and every local minimum is also a global minimum.…”
Section: Introductionmentioning
confidence: 99%
“…In this regime, the batch gradient method often fails with random initialization. As always believed, stochastic algorithms are efficient in escaping bad local minimums or saddle points in nonconvex optimization because of the inherent noise [26,60]. We observe numerically that IRWF and block IRWF from random starting point still converge to global minimum even with very small sample size which is close to the theoretical limits [57].…”
Section: Discussionsupporting
confidence: 68%
“…Furthermore, our study on the local landscape of Fourier expansions reveals that the existing of saddle points is an obstacle for us to design algorithms with theoretical guarantees. Previous research shows that gradient-based algorithms are in particular susceptible to saddle point problems [36]. Although the study of [37,38] indicate that stochastic gradient descent with random noise is enough to escape saddle points, strict saddle property, which is not valid in our case, is assumed in the work above.…”
Section: Introductionmentioning
confidence: 83%