2021
DOI: 10.48550/arxiv.2102.06571
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bayesian Neural Network Priors Revisited

Abstract: Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, such simplistic priors are unlikely to either accurately reflect our true beliefs about the weight distributions, or to give optimal performance. We study summary statistics of neural network weights in different networks trained using SGD. We find that fully connected networks (FCNNs) display heavytailed weight distributions, while convolutional neural network (CNN) weights display strong spatial correla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
32
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(37 citation statements)
references
References 38 publications
3
32
1
Order By: Relevance
“…Tran et al (2020) propose a new prior for Bayesian neural networks inspired by Gaussian processes (Rasmussen & Nickisch, 2010) based on this hypothesis. In concurrent work, Fortuin et al (2021) also explore several alternatives to standard Gaussian priors inspired by the cold posteriors effect. Wilson & Izmailov (2020) on the other hand, argue that vague Gaussian priors in the parameter space induce useful function-space priors.…”
Section: What Is the Effect Of Priors In Bayesian Neural Network?mentioning
confidence: 99%
“…Tran et al (2020) propose a new prior for Bayesian neural networks inspired by Gaussian processes (Rasmussen & Nickisch, 2010) based on this hypothesis. In concurrent work, Fortuin et al (2021) also explore several alternatives to standard Gaussian priors inspired by the cold posteriors effect. Wilson & Izmailov (2020) on the other hand, argue that vague Gaussian priors in the parameter space induce useful function-space priors.…”
Section: What Is the Effect Of Priors In Bayesian Neural Network?mentioning
confidence: 99%
“…( 4) is used in calculations, but the data follows, for example, a Student-t distribution (see Section D.2.2 for an example). The prior p(θ), which controls together with the NN architecture the function space of our approximation, may also lead to sub-optimal results using the Bayesian paradigm [143].…”
Section: A4 Posterior Tempering For Model Misspecificationmentioning
confidence: 99%
“…In this regard, a technique that is often used in practice is posterior tempering, i.e., sampling θ values from p(θ|D) 1/τ instead of the true posterior (τ = 1), where τ is called temperature. Specifically, it has been reported in the literature that "cold" posteriors, τ < 1, perform better [146,148,149], although using a more informed prior can potentially remove this effect [143]. Cold posteriors can be interpreted as over-counting the available data using 1/τ replications of it, thus, making the posterior more concentrated.…”
Section: A4 Posterior Tempering For Model Misspecificationmentioning
confidence: 99%
See 1 more Smart Citation
“…Usually the weights in Bayesian neural networks are assumed to be independent (Neal, 1996;Matthews et al, 2018;Lee et al, 2018;Garriga-Alonso et al, 2019). However, some works (Garriga-Alonso and van der Wilk, 2021; Fortuin et al, 2021) proposed correlated priors for convolutional neural networks since trained weights are empirically strongly correlated. They showed that these correlated priors can improve overall performance.…”
Section: Dependence Propertiesmentioning
confidence: 99%