2020
DOI: 10.1137/18m1192184
|View full text |Cite
|
Sign up to set email alerts
|

Mean Field Analysis of Neural Networks: A Law of Large Numbers

Abstract: We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously establish the limiting behavior of the multi-layer neural network output. The limit procedure is valid for any number of hidden layers and it naturally also describes the limiting behavior of the training loss. The ideas that we explore are to (a) take the limits of each hidden layer sequentially and (b) characterize th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
114
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 96 publications
(117 citation statements)
references
References 63 publications
3
114
0
Order By: Relevance
“…In the N → ∞ limit, the NTK becomes deterministic and constant in time. This result explains why the generalization performance converges as N → ∞, a result previously obtained for single hidden layer neural networks using a different approach [32,33,34,35].…”
Section: Introductionsupporting
confidence: 74%
“…In the N → ∞ limit, the NTK becomes deterministic and constant in time. This result explains why the generalization performance converges as N → ∞, a result previously obtained for single hidden layer neural networks using a different approach [32,33,34,35].…”
Section: Introductionsupporting
confidence: 74%
“…We mathematically analyze neural networks with a single hidden layer in the asymptotic regime of large network sizes and large numbers of stochastic gradient descent iterations. A law of large numbers was previously proven in [30], see also [27,29] for related results. This paper rigorously proves a central limit theorem (CLT) for the empirical distribution of the neural network parameters.…”
Section: Introductionmentioning
confidence: 64%
“…[30] proves the mean-field limit µ N p →μ as N → ∞. The convergence theorems of [30] are summarized below.…”
Section: Law Of Large Numbersmentioning
confidence: 99%
See 1 more Smart Citation
“…Other recent applications that have motivated this work are global optimization [40], active media [3] and machine learning. Indeed, it has been shown recently [41,43] that "stochastic gradient descent", the optimization algorithm used in the training of neural networks, can be represented as the evolution of a particle system with interactions governed by a potential related to the objective function that is used to train the network. Several of the issues that we study here, such as phase transitions and the effect of nonconvexity, are of great interest in the context of the training of neural networks.…”
Section: Introductionmentioning
confidence: 99%