The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization

Li, Mufan Bill; Nica, Mihai; Roy, Daniel M.

doi:10.48550/arxiv.2106.04013

Cited by 2 publications

(2 citation statements)

References 25 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on the gaussian process (GP; Neal, 1995;Lee et al, 2018;de G. Matthews, Rowland, Hron, Turner, & Ghahramani, 2018) property of deep neural networks, Jacot et al (2018) introduce the neural tangent kernel and describe the exact dynamics of a fully connected network's output through gradient flow training in an overparameterized situation. This initial work has been followed by a series of studies, including the exacter description (Arora et al, 2019;Lee et al, 2019), generalization for different initialization (Liu, Zhu, & Belkin, 2020;Sohl-Dickstein, Novak, Schoenholz, & Lee, 2020), and NTK for different structures of neural networks (Arora et al, 2019;Li et al, 2021;Luo, Xu, Ma, & Zhang, 2021).…”

Section: Neural Tangent Kernelmentioning

confidence: 99%

On the Explainability of Graph Convolutional Network With GCN Tangent Kernel

Zhou

Wang

2023

Neural Computation

View full text Add to dashboard Cite

Graph convolutional network (GCN) is a powerful deep model in dealing with graph data. However, the explainability of GCN remains a difficult problem since the training behaviors for graph neural networks are hard to describe. In this work, we show that for GCN with wide hidden feature dimension, the output for semisupervised problem can be described by a simple differential equation. In addition, the dynamics of the behavior of output is decided by the graph convolutional neural tangent kernel (GCNTK), which is stable when the width of hidden feature tends to be infinite. And the solution of node classification can be explained directly by the differential equation for a semisupervised problem. The experiments on some toy models speak to the consistency of the GCNTK model and GCN.

show abstract

Section: Neural Tangent Kernelmentioning

confidence: 99%

On the Explainability of Graph Convolutional Network With GCN Tangent Kernel

Zhou

Wang

2023

Neural Computation

View full text Add to dashboard Cite

show abstract

“…It is thus natural to conjecture that, under appropriate scaling, the same approach applied to a NN whose weights are distributed as Stable distributions would result in converge to the solution of a Levy-driven SDE. A more recent line of research focuses on taking joint limits in width and depth (Li et al, 2021). Here, the theory is less developed, and a formal result among the lines of Theorem 1 is lacking.…”

Section: Depth Limitsmentioning

confidence: 99%

Deep Stable neural networks: large-width asymptotics and convergence rates

Favaro¹,

Fortini²,

Peluchetti³

2021

Preprint

View full text Add to dashboard Cite

In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotics for deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussiandistributed weights, and classes of Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in several contexts of practical interest, e.g. Bayesian inference under Gaussian SP priors, kernel regression for infinite-wide deep NNs trained via gradient descent, and information propagation within infinite-wide NNs. Motivated by empirical analysis, showing the potential of replacing Gaussian distributions with Stable distributions for the NN's weights, in this paper we investigate large-width asymptotics for (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. First, we show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively through the NN's layers. Because of the non-triangular NN's structure, this is a non-standard asymptotic problem, to which we propose a novel and self-contained inductive approach, which may be of independent interest. Then, we establish sup-norm convergence rates of a deep Stable NN to a Stable SP, quantifying the critical difference between the settings of "joint growth" and "sequential growth" of the width over the NN's layers. Our work extends recent results on infinite-wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates for infinite-wide deep NNs.

show abstract

The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization

Cited by 2 publications

References 25 publications

On the Explainability of Graph Convolutional Network With GCN Tangent Kernel

On the Explainability of Graph Convolutional Network With GCN Tangent Kernel

Deep Stable neural networks: large-width asymptotics and convergence rates

Contact Info

Product

Resources

About