2020
DOI: 10.48550/arxiv.2007.05864
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bayesian Deep Ensembles via the Neural Tangent Kernel

Abstract: We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, ra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(23 citation statements)
references
References 27 publications
0
23
0
Order By: Relevance
“…where γ n → ∞ is a sequence with arbitrarily slow growth, we can verify that there exists an M > 0 such that both (40) and (43) converges to zero by noting that as n → ∞, mτ 2 m → ∞, τ 2 m → 0. Hence, the term (I) converges to zero.…”
Section: B2 Proof Of Theorem 31mentioning
confidence: 99%
See 1 more Smart Citation
“…where γ n → ∞ is a sequence with arbitrarily slow growth, we can verify that there exists an M > 0 such that both (40) and (43) converges to zero by noting that as n → ∞, mτ 2 m → ∞, τ 2 m → 0. Hence, the term (I) converges to zero.…”
Section: B2 Proof Of Theorem 31mentioning
confidence: 99%
“…While the algorithm can be directly applied to neural network models as in [23], we follow [40] and modify the objective, to account for the difference between the neural tangent kernel (NTK) [41] of a wide neural network architecture and the NNGP kernel of the corresponding infinite-width Bayesian neural network [42][43][44]. Concretely, we modify (13) as…”
Section: Scalable Approximate Inference Via a Randomized Prior Trickmentioning
confidence: 99%
“…The most successful family of UQ methods so far in deep learning has been based on the Bayesian framework [26][27][28][29][30][31][32][33][34][35][36][37][38]. Alternative methods are based, indicatively, on ensembles of NN optimization iterates or independently trained NNs [39][40][41][42][43][44][45][46][47][48][49][50], as well as on the evidential framework [51][52][53][54][55][56][57][58][59]. Although Bayesian methods and ensembles are thoroughly discussed in this paper, the interested reader is also directed to the recent review studies in [60][61][62][63][64][65][66][67][68][69][70][71][72] for more information.…”
Section: Motivation and Scope Of The Papermentioning
confidence: 99%
“…Equivalently, based on the results of (Lee et al, 2019), He et al (2020) show that f t can be expressed based on the random initialization f 0 . For t → ∞ this gives:…”
Section: Neural Tangent Kernelmentioning
confidence: 99%