Neural tangent kernel: convergence and generalization in neural networks (invited paper)

Jacot, Arthur Paul; Gabriel, Franck; Hongler, Clément

doi:10.1145/3406325.3465355

Cited by 864 publications

(1,922 citation statements)

References 5 publications

Supporting

Mentioning

1,889

Contrasting

Order By: Relevance

“…We now take the continuous-time limit η → 0, and introduce the standard notation lim η→0 T t=0 ∞ −∞ dh t =: Dh, and lim η→0 T t=0 i∞ −i∞ dz t 2πi =: Dz for the path-integrals over the real h and complex z fields. 12 Within the exponential, we have lim η→0 t h t η = dt h(t)…”

Section: Constructing the Partition Functionmentioning

confidence: 99%

“…, n} need not be time-ordered. 12 Formally, this simply amounts to recovering (1) from the Itô discretization (3). Note that while there is no obvious continuum limit in the neural index, there is a sensible continuum limit in the temporal/layer index (recall that we work at T → ∞), and it is the latter we are considering here, hence the N -component fields that give rise to the analogies with the O(N ) vector model below; we thank Dan and Sho for discussions on this point.…”

Section: Constructing the Partition Functionmentioning

confidence: 99%

“…Recall that the function f (t) appearing in the partition function (12) depends on the trainable parameters A, B, W, U, b, cf. (2).…”

Section: Self-averaging Random Networkmentioning

confidence: 99%

“…where ρ(X i j ) for each X ∈ {A, B, W, U} are the normalized Gaussian probability density functions (16), and similarly for ρ(b i ), which we have absorbed into the measures DX , Db for compactness. Substituting the rule (2) into the partition function (12) and integrating over the disorder X and the bias b, we obtain Z[ j,] = Dh Dz e S 0 +S source +S int , (18) where…”

Section: Self-averaging Random Networkmentioning

confidence: 99%

“…While the infinite-width limit provides an analytically tractable approximation that has led to important progress (see, e.g., [12][13][14][15]), it fails to capture crucial aspects of real-world networks which must of necessity be of finite width. For example, the lack of interactionsi.e., intralayer correlations -in the Gaussian limit implies that representations in these idealized networks do not evolve during gradient-based learning [2,16].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

The edge of chaos: quantum field theory and deep neural networks

Jefferson

2022

SciPost Phys.

View full text Add to dashboard Cite

We explicitly construct the quantum field theory corresponding to a general class of deep neural networks encompassing both recurrent and feedforward architectures. We first consider the mean-field theory (MFT) obtained as the leading saddlepoint in the action, and derive the condition for criticality via the largest Lyapunov exponent. We then compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth T to width N, and find a precise analogy with the well-studied O(N) vector model, in which the variance of the weight initializations plays the role of the 't Hooft coupling. In particular, we compute both the O(1) corrections quantifying fluctuations from typicality in the ensemble of networks, and the subleading O(T/N) corrections due to finite-width effects. These provide corrections to the correlation length that controls the depth to which information can propagate through the network, and thereby sets the scale at which such networks are trainable by gradient descent. Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.

show abstract

Section: Constructing the Partition Functionmentioning

confidence: 99%

Section: Constructing the Partition Functionmentioning

confidence: 99%

“…Recall that the function f (t) appearing in the partition function (12) depends on the trainable parameters A, B, W, U, b, cf. (2).…”

Section: Self-averaging Random Networkmentioning

confidence: 99%

Section: Self-averaging Random Networkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

The edge of chaos: quantum field theory and deep neural networks

Jefferson

2022

SciPost Phys.

View full text Add to dashboard Cite

show abstract

REDS: Random ensemble deep spatial prediction

Daw

Wikle

2022

Environmetrics

View full text Add to dashboard Cite

There has been a great deal of recent interest in the development of spatial prediction algorithms for very large datasets and/or prediction domains. These methods have primarily been developed in the spatial statistics community, but there has been growing interest in the machine learning community for such methods, primarily driven by the success of deep Gaussian process regression approaches and deep convolutional neural networks. These methods are often computationally expensive to train and implement and consequently, there has been a resurgence of interest in random projections and deep learning models based on random weights—so called reservoir computing methods. Here, we combine several of these ideas to develop the random ensemble deep spatial (REDS) approach to predict spatial data. The procedure uses random Fourier features as inputs to an extreme learning machine (a deep neural model with random weights), and with calibrated ensembles of outputs from this model based on different random weights, it provides a simple uncertainty quantification. The REDS method is demonstrated on simulated data and on a classic large satellite data set.

show abstract

Deep Q‐learning: A robust control approach

Varga

Kulcsár

Chehreghani

2022

Intl J Robust & Nonlinear

View full text Add to dashboard Cite

This work aims at constructing a bridge between robust control theory and reinforcement learning. Although, reinforcement learning has shown admirable results in complex control tasks, the agent's learning behavior is opaque. Meanwhile, system theory has several tools for analyzing and controlling dynamical systems. This article places deep Q-learning is into a control-oriented perspective to study its learning dynamics with well-established techniques from robust control. An uncertain linear time-invariant model is formulated by means of the neural tangent kernel to describe learning. This novel approach allows giving conditions for stability (convergence) of the learning and enables the analysis of the agent's behavior in frequency-domain. The control-oriented approach makes it possible to formulate robust controllers that inject dynamical rewards as control input in the loss function to achieve better convergence properties. Three output-feedback controllers are synthesized: gain scheduling  2 , dynamical  ∞ , and fixed-structure  ∞ controllers. Compared to traditional deep Q-learning techniques, which involve several heuristics, setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature. The proposed approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the  ∞ controlled learning can converge faster and receive higher scores (depending on the environment) compared to the benchmark double deep Q-learning.

show abstract

Neural tangent kernel: convergence and generalization in neural networks (invited paper)

Cited by 864 publications

References 5 publications

The edge of chaos: quantum field theory and deep neural networks

The edge of chaos: quantum field theory and deep neural networks

REDS: Random ensemble deep spatial prediction

Deep Q‐learning: A robust control approach

Contact Info

Product

Resources

About