2019
DOI: 10.48550/arxiv.1907.10300
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sparse Optimization on Measures with Over-parameterized Gradient Descent

Lenaic Chizat

Abstract: Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical problem arising, e.g., in sparse spikes deconvolution or two-layer neural networks training. We show that this problem can be solved by discretizing the measure and running nonconvex gradient descent on the positions and weights of the particles. For measures on a d-dimensional manifold and under some non-degeneracy assumptions, this leads to a global optimization algorithm with a complexity scaling as log(1/ ) in the desir… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
55
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(58 citation statements)
references
References 37 publications
3
55
0
Order By: Relevance
“…1 Our arguments can indeed be applied to situations where the parameter space can be factored as R+ × Θ where Θ is a compact Riemannian manifold without boundary, see Chizat (2019). For clarity, we limit ourselves to a parameter space R p (which corresponds to Θ = S p−1 ) while for S-ReLU, this would correspond to Θ = S p−1 S p−1 .…”
Section: -Homogeneous Neural Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…1 Our arguments can indeed be applied to situations where the parameter space can be factored as R+ × Θ where Θ is a compact Riemannian manifold without boundary, see Chizat (2019). For clarity, we limit ourselves to a parameter space R p (which corresponds to Θ = S p−1 ) while for S-ReLU, this would correspond to Θ = S p−1 S p−1 .…”
Section: -Homogeneous Neural Networkmentioning
confidence: 99%
“…A general drawback of those mean-field analyses is that they are mostly non-quantitative, both in terms of number of neurons and number of iterations. While some works have shown quantitative results by modifying the dynamics (Mei et al, 2019;Wei et al, 2019;Chizat, 2019), we do not take this path in order to stay close to the way neural networks are used in practice and because our numerical experiments suggest that those modifications are not necessary to obtain a good practical behavior. Finally, we stress that our analysis does not take place in the lazy training regime which consists of training dynamics that can be analyzed in a perturbative regime around the initialization (see, e.g., Li and Liang, 2018;Jacot et al, 2018;Du et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…Firstly, to ensure the full support assumption, we can consider extending the neural birthdeath [16,51] to deep ResNets. Neural birth-death dynamics considers the gradient flow in the Wasserstein-Fisher-Rao space [19] rather than the Wasserstein space and ensures convergence.…”
Section: Discussionmentioning
confidence: 99%
“…While there have been several proven global convergence guarantees for multilayer networks [17,21,20], understanding of the convergence rate is still lacking. Even in the shallow case, global convergence has been studied only for a type of sparsity-inducing regularization [5,4]. Unless the convergence rate for multilayer networks is generally perilous (an unlikely scenario in light of the experiments in [15]), our result is expected to be relevant.…”
Section: Introductionmentioning
confidence: 86%