Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Jacot, Arthur Paul; Gabriel, Franck; Hongler, Clément

doi:10.48550/arxiv.1806.07572

Cited by 162 publications

(357 citation statements)

References 0 publications

Supporting

Mentioning

348

Contrasting

Unclassified

Order By: Relevance

“…The investigation of neural network behaviour in the extreme width limit is also relevant to our work (Neal, 1996;Lee et al, 2018;Jacot et al, 2018;Matthews et al, 2018;Lee et al, 2019;Matthews et al, 2018;Novak et al, 2018;Garriga-Alonso et al, 2018;Allen-Zhu et al, 2018;Khan et al, 2019;Agrawal et al, 2020), which establishes some equivalence between neural networks and Gaussian processes (GPs) (Rasmussen, 2003). Notably, in the case of one hidden layer, the equivalence between infinitely-wide BNNs and GPs was originally discovered by (Neal, 1996).…”

Section: Related Workmentioning

confidence: 53%

Stochastic Neural Networks with Infinite Width are Deterministic

Ziyin¹,

Zhang²,

Meng³

et al. 2022

Preprint

View full text Add to dashboard Cite

This work theoretically studies stochastic neural networks, a main type of neural network in use. Specifically, we prove that as the width of an optimized stochastic neural network tends to infinity, its predictive variance on the training set decreases to zero. Two common examples that our theory applies to are neural networks with dropout and variational autoencoders. Our result helps better understand how stochasticity affects the learning of neural networks and thus design better architectures for practical problems.

show abstract

Section: Related Workmentioning

confidence: 53%

Stochastic Neural Networks with Infinite Width are Deterministic

Ziyin¹,

Zhang²,

Meng³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…While finding such a metric is in general not easy to do, we show that it is possible in the case of gradient descent for kernel ridge regression [Shawe-Taylor et al, 2004]. Kernel methods (which are inherently linear) can be used to derive insights into nonlinear systems such deep neural networks [Jacot et al, 2018, Lee et al, 2019, Fort et al, 2020, Canatar et al, 2021. Without loss of generality, we assume an element-wise feature map such that for a matrix X ∈ R q×z , the matrix φ(X) ∈ R q×z satisfies φ(X) ij = φ(X ij ).…”

Section: Picking the Best Metric For Kernel Regressionmentioning

confidence: 90%

Generalization in Supervised Learning Through Riemannian Contraction

Kozachkov¹,

Wensing²,

Slotine³

2022

Preprint

View full text Add to dashboard Cite

We prove that Riemannian contraction in a supervised learning setting implies generalization. Specifically, we show that if an optimizer is contracting in some Riemannian metric with rate λ > 0, it is uniformly algorithmically stable with rate O(1/λn), where n is the number of labelled examples in the training set. The results hold for stochastic and deterministic optimization, in both continuous and discrete-time, for convex and non-convex loss surfaces. The associated generalization bounds reduce to well-known results in the particular case of gradient descent over convex or strongly convex loss surfaces. They can be shown to be optimal in certain linear settings, such as kernel ridge regression under gradient flow.

show abstract

“…In every experiment, the learning agent is a 2 layer fully-connected ReLU network with 2500 neurons with bias terms and appropriate input-output sizes. That is to comply with the assumptions in [21], i.e. a shallow and wide neural network.…”

Section: Methodsmentioning

confidence: 95%

“…In order to formulate the model, first, we introduce the NTK alongside some of its relevant properties. [21]. Given data , ∈ ⊆ ℝ , the NTK of an input 1 output artificial neural network ( , ( )) ∶ ℝ → ℝ, parametrized with ( ), is…”

Section: Control-oriented Modeling Of Deep Q-learningmentioning

confidence: 99%

See 1 more Smart Citation

Deep Q-learning: a robust control approach

Varga¹,

Kulcsár²,

Chehreghani³

2022

Preprint

View full text Add to dashboard Cite

In this paper, we place deep Q-learning into a control-oriented perspective and study its learning dynamics with well-established techniques from robust control. We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning. We show the instability of learning and analyze the agent's behavior in frequency-domain. Then, we ensure convergence via robust controllers acting as dynamical rewards in the loss function. We synthesize three controllers: state-feedback gain scheduling  2 , dynamic  ∞ , and constant gain  ∞ controllers.Setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature compared to the heuristics in reinforcement learning. In addition, our approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the  ∞ controlled learning performs slightly better than Double deep Q-learning.

show abstract

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Cited by 162 publications

References 0 publications

Stochastic Neural Networks with Infinite Width are Deterministic

Stochastic Neural Networks with Infinite Width are Deterministic

Generalization in Supervised Learning Through Riemannian Contraction

Deep Q-learning: a robust control approach

Contact Info

Product

Resources

About