What Are Bayesian Neural Network Posteriors Really Like?

Izmailov, Pavel; Vikram, Sharad; Hoffman, Matthew D.; Wilson, Andrew Gordon

doi:10.48550/arxiv.2104.14421

Cited by 27 publications

(38 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We ran experiments on Bayesian NN regression, classification, logistic regression and ICA (Amari et al, 1996), reporting accuracies, log joints (Welling and Teh, 2011;Izmailov et al, 2021) and expected calibration error (ECE) (Guo et al, 2017). For details on exact experimental setups please see Appendix F. Across experiments we compare to SGLD as (Izmailov et al, 2021). In the Bayesian NN tasks the likelihood is parametrised via p(y…”

Section: Resultsmentioning

confidence: 99%

Bayesian Learning via Neural Schrödinger-Föllmer Flows

Vargas¹,

Ovsianas²,

Fernandes³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control. We advocate stochastic control as a finite time alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.

show abstract

Section: Resultsmentioning

confidence: 99%

Bayesian Learning via Neural Schrödinger-Föllmer Flows

Vargas¹,

Ovsianas²,

Fernandes³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We measure the quality of each sampling method's approximation to the predictive distribution corresponding to the true posterior. We generate the ground truth predictive distribution by running 20,000 samples of SGLD (Welling & Teh, 2011), and follow (Izmailov et al, 2021) by measuring the top-1 agreement and total variation with respect to the ground truth predictive distribution. Total variation is…”

Section: Bayesian Neural Network Subspace Inferencementioning

confidence: 99%

“…The task is 10-class image classification, and we use a ResNet-20 with Filter Response Normalization from (Izmailov et al, 2021) as our base model. Our results in Figure 5 demonstrate that our online thinning method outperforms both the baseline sampler and SPMCMC-based samplers on the agreement metric.…”

Section: Cifar-10 Classificationmentioning

confidence: 99%

See 1 more Smart Citation

Online, Informative MCMC Thinning with Kernelized Stein Discrepancy

Hawkins¹,

Koppel²,

Zhang³

2022

Preprint

View full text Add to dashboard Cite

A fundamental challenge in Bayesian inference is efficient representation of a target distribution. Many non-parametric approaches do so by sampling a large number of points using variants of Markov Chain Monte Carlo (MCMC). We propose an MCMC variant that retains only those posterior samples which exceed a KSD threshold, which we call KSD Thinning. We establish the convergence and complexity tradeoffs for several settings of KSD Thinning as a function of the KSD threshold parameter, sample size, and other problem parameters. Finally, we provide experimental comparisons against other online nonparametric Bayesian methods that generate lowcomplexity posterior representations, and observe superior consistency/complexity tradeoffs. Code is available at github.com/colehawkins/ KSD-Thinning.

show abstract

“…This implies that good Bayesian inference can only be made if the statistics of y are correctly modeled. For example, stochastic neural networks are expected to have well-calibrated uncertainty estimates, a trait that is highly desirable for practical safe and reliable applications (Wilson & Izmailov, 2020;Gawlikowski et al, 2021;Izmailov et al, 2021). This expectation means that a well-trained stochastic network should have a predictive variance that matches the actual level of randomness in the labeling.…”

Section: Related Workmentioning

confidence: 99%

Stochastic Neural Networks with Infinite Width are Deterministic

Ziyin¹,

Zhang²,

Meng³

et al. 2022

Preprint

View full text Add to dashboard Cite

This work theoretically studies stochastic neural networks, a main type of neural network in use. Specifically, we prove that as the width of an optimized stochastic neural network tends to infinity, its predictive variance on the training set decreases to zero. Two common examples that our theory applies to are neural networks with dropout and variational autoencoders. Our result helps better understand how stochasticity affects the learning of neural networks and thus design better architectures for practical problems.

show abstract

What Are Bayesian Neural Network Posteriors Really Like?

Cited by 27 publications

References 43 publications

Bayesian Learning via Neural Schrödinger-Föllmer Flows

Bayesian Learning via Neural Schrödinger-Föllmer Flows

Online, Informative MCMC Thinning with Kernelized Stein Discrepancy

Stochastic Neural Networks with Infinite Width are Deterministic

Contact Info

Product

Resources

About