Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing 2021
DOI: 10.1145/3406325.3465355
|View full text |Cite
|
Sign up to set email alerts
|

Neural tangent kernel: convergence and generalization in neural networks (invited paper)

Abstract: At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit (14; 11), thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function f θ (which maps input vectors to output vectors) follows the kernel gradient of the functional cost (which is convex, in contrast to the parameter cost) w.r.t. a new kernel: the Neural… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

32
1,889
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 864 publications
(1,922 citation statements)
references
References 5 publications
32
1,889
1
Order By: Relevance
“…We now take the continuous-time limit η → 0, and introduce the standard notation lim η→0 T t=0 ∞ −∞ dh t =: Dh, and lim η→0 T t=0 i∞ −i∞ dz t 2πi =: Dz for the path-integrals over the real h and complex z fields. 12 Within the exponential, we have lim η→0 t h t η = dt h(t)…”
Section: Constructing the Partition Functionmentioning
confidence: 99%
See 4 more Smart Citations
“…We now take the continuous-time limit η → 0, and introduce the standard notation lim η→0 T t=0 ∞ −∞ dh t =: Dh, and lim η→0 T t=0 i∞ −i∞ dz t 2πi =: Dz for the path-integrals over the real h and complex z fields. 12 Within the exponential, we have lim η→0 t h t η = dt h(t)…”
Section: Constructing the Partition Functionmentioning
confidence: 99%
“…, n} need not be time-ordered. 12 Formally, this simply amounts to recovering (1) from the Itô discretization (3). Note that while there is no obvious continuum limit in the neural index, there is a sensible continuum limit in the temporal/layer index (recall that we work at T → ∞), and it is the latter we are considering here, hence the N -component fields that give rise to the analogies with the O(N ) vector model below; we thank Dan and Sho for discussions on this point.…”
Section: Constructing the Partition Functionmentioning
confidence: 99%
See 3 more Smart Citations