2018
DOI: 10.48550/arxiv.1806.07572
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

8
348
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 162 publications
(357 citation statements)
references
References 0 publications
8
348
0
1
Order By: Relevance
“…The investigation of neural network behaviour in the extreme width limit is also relevant to our work (Neal, 1996;Lee et al, 2018;Jacot et al, 2018;Matthews et al, 2018;Lee et al, 2019;Matthews et al, 2018;Novak et al, 2018;Garriga-Alonso et al, 2018;Allen-Zhu et al, 2018;Khan et al, 2019;Agrawal et al, 2020), which establishes some equivalence between neural networks and Gaussian processes (GPs) (Rasmussen, 2003). Notably, in the case of one hidden layer, the equivalence between infinitely-wide BNNs and GPs was originally discovered by (Neal, 1996).…”
Section: Related Workmentioning
confidence: 53%
“…The investigation of neural network behaviour in the extreme width limit is also relevant to our work (Neal, 1996;Lee et al, 2018;Jacot et al, 2018;Matthews et al, 2018;Lee et al, 2019;Matthews et al, 2018;Novak et al, 2018;Garriga-Alonso et al, 2018;Allen-Zhu et al, 2018;Khan et al, 2019;Agrawal et al, 2020), which establishes some equivalence between neural networks and Gaussian processes (GPs) (Rasmussen, 2003). Notably, in the case of one hidden layer, the equivalence between infinitely-wide BNNs and GPs was originally discovered by (Neal, 1996).…”
Section: Related Workmentioning
confidence: 53%
“…While finding such a metric is in general not easy to do, we show that it is possible in the case of gradient descent for kernel ridge regression [Shawe-Taylor et al, 2004]. Kernel methods (which are inherently linear) can be used to derive insights into nonlinear systems such deep neural networks [Jacot et al, 2018, Lee et al, 2019, Fort et al, 2020, Canatar et al, 2021. Without loss of generality, we assume an element-wise feature map such that for a matrix X ∈ R q×z , the matrix φ(X) ∈ R q×z satisfies φ(X) ij = φ(X ij ).…”
Section: Picking the Best Metric For Kernel Regressionmentioning
confidence: 90%
“…In every experiment, the learning agent is a 2 layer fully-connected ReLU network with 2500 neurons with bias terms and appropriate input-output sizes. That is to comply with the assumptions in [21], i.e. a shallow and wide neural network.…”
Section: Methodsmentioning
confidence: 95%
“…In order to formulate the model, first, we introduce the NTK alongside some of its relevant properties. [21]. Given data , ∈ ⊆ ℝ , the NTK of an input 1 output artificial neural network ( , ( )) ∶ ℝ → ℝ, parametrized with ( ), is…”
Section: Control-oriented Modeling Of Deep Q-learningmentioning
confidence: 99%
See 1 more Smart Citation