2021
DOI: 10.1088/2632-2153/ac0615
|View full text |Cite
|
Sign up to set email alerts
|

Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

Abstract: In this paper we investigate how gradient-based algorithms such as gradient descent (GD), (multi-pass) stochastic GD, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity. We consider the loss landscape of the high-dimensional phase retrieval problem as a prototypical highly non-convex example. We observe that for phase retrieval the stochastic variants of GD are able to reach perfect… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
17
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 18 publications
(19 citation statements)
references
References 17 publications
1
17
1
Order By: Relevance
“…tensor PCA [57,58], phase retrieval [59], and high-dimensional linear classifiers [60][61][62][63], but has yet to be developed for deep feature learning. By developing a self-consistent DMFT of deep NNs, we gain insight into how features evolve in the rich regime of network training, while retaining many pleasant analytic properties of the infinite width limit.…”
Section: Related Workmentioning
confidence: 99%
“…tensor PCA [57,58], phase retrieval [59], and high-dimensional linear classifiers [60][61][62][63], but has yet to be developed for deep feature learning. By developing a self-consistent DMFT of deep NNs, we gain insight into how features evolve in the rich regime of network training, while retaining many pleasant analytic properties of the infinite width limit.…”
Section: Related Workmentioning
confidence: 99%
“…This is a powerful formalism but the integral equations must usually be solved entirely numerically which itself is not a trivial task. For problems close to the present context (neural networks,generalized linear models, phase retrieval) DMFT has been developed in the recent works [42,43,44,45,46]. We think that comprehensively comparing this formalism with the present approach is an interesting open problem.…”
Section: Discussionmentioning
confidence: 96%
“…Unfortunately, our solution algorithms display convergence problems even at rather short times, thus preventing us to obtain reliable numerical predictions for the jamming transition from DMFT. Similar problems where encountered in reference [17]. To obtain more insight into the problem, we then simulated the RLG in several dimensions ranging from d = 2 to d = 22.…”
Section: Discussionmentioning
confidence: 99%
“…The unknown function f (x) is then parametrized by a guess function g(x; θ), whose parameters θ can be learnt by minimizing a loss function accounting for the error made in labelling data from the training set. Also in this case the GD dynamics can be surprisingly complex [10][11][12][13][14][15][16][17].…”
Section: Introductionmentioning
confidence: 99%