2021
DOI: 10.48550/arxiv.2106.04013
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization

Abstract: Theoretical results show that neural networks can be approximated by Gaussian processes in the infinite-width limit. However, for fully connected networks, it has been previously shown that for any fixed network width, n, the Gaussian approximation gets worse as the network depth, d, increases. Given that modern networks are deep, this raises the question of how well modern architectures, like ResNets, are captured by the infinite-width limit. To provide a better approximation, we study ReLU ResNets in the inf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 25 publications
(40 reference statements)
0
2
0
Order By: Relevance
“…Based on the gaussian process (GP; Neal, 1995;Lee et al, 2018;de G. Matthews, Rowland, Hron, Turner, & Ghahramani, 2018) property of deep neural networks, Jacot et al (2018) introduce the neural tangent kernel and describe the exact dynamics of a fully connected network's output through gradient flow training in an overparameterized situation. This initial work has been followed by a series of studies, including the exacter description (Arora et al, 2019;Lee et al, 2019), generalization for different initialization (Liu, Zhu, & Belkin, 2020;Sohl-Dickstein, Novak, Schoenholz, & Lee, 2020), and NTK for different structures of neural networks (Arora et al, 2019;Li et al, 2021;Luo, Xu, Ma, & Zhang, 2021).…”
Section: Neural Tangent Kernelmentioning
confidence: 99%
“…Based on the gaussian process (GP; Neal, 1995;Lee et al, 2018;de G. Matthews, Rowland, Hron, Turner, & Ghahramani, 2018) property of deep neural networks, Jacot et al (2018) introduce the neural tangent kernel and describe the exact dynamics of a fully connected network's output through gradient flow training in an overparameterized situation. This initial work has been followed by a series of studies, including the exacter description (Arora et al, 2019;Lee et al, 2019), generalization for different initialization (Liu, Zhu, & Belkin, 2020;Sohl-Dickstein, Novak, Schoenholz, & Lee, 2020), and NTK for different structures of neural networks (Arora et al, 2019;Li et al, 2021;Luo, Xu, Ma, & Zhang, 2021).…”
Section: Neural Tangent Kernelmentioning
confidence: 99%
“…It is thus natural to conjecture that, under appropriate scaling, the same approach applied to a NN whose weights are distributed as Stable distributions would result in converge to the solution of a Levy-driven SDE. A more recent line of research focuses on taking joint limits in width and depth (Li et al, 2021). Here, the theory is less developed, and a formal result among the lines of Theorem 1 is lacking.…”
Section: Depth Limitsmentioning
confidence: 99%