2021
DOI: 10.1214/20-aihp1140
|View full text |Cite
|
Sign up to set email alerts
|

Mean-field Langevin dynamics and energy landscape of neural networks

Abstract: HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des labor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
29
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(29 citation statements)
references
References 48 publications
0
29
0
Order By: Relevance
“…Recently, law of large numbers and central limit theorems have been established for neural networks with a single hidden layer [10,30,43,48,49,50]. For a single hidden layer, one can directly study the weak convergence of the empirical measure of the parameters.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, law of large numbers and central limit theorems have been established for neural networks with a single hidden layer [10,30,43,48,49,50]. For a single hidden layer, one can directly study the weak convergence of the empirical measure of the parameters.…”
Section: Introductionmentioning
confidence: 99%
“…As previously discussed, related limiting results for the single-layer neural network case have been investigated in [10,30,43,48,49,50]. In those papers, it is proven that as the number of hidden units and stochastic gradient descent steps, in the appropriate scaling, diverge to infinity, the empirical distribution of the neural network parameters converges to the weak solution of a non-local PDE.…”
Section: Introductionmentioning
confidence: 99%
“…In view of the aforementioned, two recent developments are relevant: sampling methods based on either optimal control or mean field approaches [20,21], and the application of Monte Carlo methods from statistical mechanics-especially molecular dynamics-to problems in machine learning or Bayesian inference [16,33]. Some of the modern stochastic optimisation methods from machine learning, like ADAM, AdaGrad or RMSProp adaptively control the learning rate so as to improve the convergence to a local minimum, but they also share many features with adaptive versions of the Langevin equation [17,25].…”
mentioning
confidence: 99%
“…In Hu et al [9] it has been shown that the marginal law m t converges towards m * , and this provides an algorithm to approximate the minimizer m * . Similar topics have been explored in [5,10,14].…”
Section: Introductionmentioning
confidence: 99%
“…This reformulation is crucial, because the potential function F defined above is convex in the measure space. In this paper, as in [9,10], we shall add an entropy term H(m) in order to regularize the problem. The regularized problem reads…”
Section: Introductionmentioning
confidence: 99%