Function Space Particle Optimization for Bayesian Neural Networks

Wang, Ziyu; Ren, Tongzheng; Zhu, Jun; Zhang, Bo

doi:10.48550/arxiv.1902.09754

Cited by 6 publications

(10 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Kendall & Gal (2017) used MC dropout for model uncertainty and combined it with the idea by Nix & Weigend (1994) of directly modeling mean and data noise as network outputs. Wang et al (2019) and Wen et al (2020) refined ensemble methods for NNs by further promoting their diversity on the function space and by reducing their computational cost, respectively. For classification, Malinin & Gales (2018) introduced prior networks, which explicitly model in-sample and out-of-sample uncertainty, where the latter is realized by minimizing the reverse KL-distance to a selected flat pointwise defined prior.…”

Section: Overview Of Our Contributionmentioning

confidence: 99%

NOMU: Neural Optimization-based Model Uncertainty

Jakob¹,

Weissteiner²,

Wutte³

et al. 2021

Preprint

View full text Add to dashboard Cite

We introduce a new approach for capturing model uncertainty for neural networks (NNs) in regression, which we call Neural Optimization-based Model Uncertainty (NOMU). The main idea of NOMU is to design a network architecture consisting of two connected sub-networks, one for the model prediction and one for the model uncertainty, and to train it using a carefully designed loss function. With this design, NOMU can provide model uncertainty for any given (previously trained) NN by plugging it into the framework as the sub-network used for model prediction. NOMU is designed to yield uncertainty bounds (UBs) that satisfy four important desiderata regarding model uncertainty, which established methods often do not satisfy. Furthermore, our UBs are themselves representable as a single NN, which leads to computational cost advantages in applications such as Bayesian optimization. We evaluate NOMU experimentally in multiple settings. For regression, we show that NOMU performs as well as or better than established benchmarks. For Bayesian optimization, we show that NOMU outperforms all other benchmarks.

show abstract

Section: Overview Of Our Contributionmentioning

confidence: 99%

NOMU: Neural Optimization-based Model Uncertainty

Jakob¹,

Weissteiner²,

Wutte³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We study different kernel functions in the weight and function space, as well as deterministic and stochastic update rules. This includes some existing approaches as special cases, such as standard deep ensembles [33], BNN weight-space SVGD [26], and BNN function-space SVGD [55], but it also includes several novel approaches. We will lay out the motivation and theoretical properties for each approach and later proceed to empirically evaluating their respective performance.…”

Section: Stein Variational Neural Network Ensemblesmentioning

confidence: 99%

“…theoretically and empirically. We also include two new approaches in our comparison and show that our hybrid h-SVGD method, that acts both in the weight and function space, and our fw-SVGD method, that fixes an issue with an existing functional SVGD approach [55], lead to more diverse ensembles and improved uncertainty estimation and out-of-distribution detection, as well as approaching the gold-standard Hamiltonian Monte Carlo posterior more closely.…”

Section: Introductionmentioning

confidence: 99%

On Stein Variational Neural Network Ensembles

D’Angelo¹,

Fortuin²,

Wenzel³

2021

Preprint

View full text Add to dashboard Cite

Ensembles of deep neural networks have achieved great success recently, but they do not offer a proper Bayesian justification. Moreover, while they allow for averaging of predictions over several hypotheses, they do not provide any guarantees for their diversity, leading to redundant solutions in function space. In contrast, particle-based inference methods, such as Stein variational gradient descent (SVGD), offer a Bayesian framework, but rely on the choice of a kernel to measure the similarity between ensemble members. In this work, we study different SVGD methods operating in the weight space, function space, and in a hybrid setting. We compare the SVGD approaches to other ensembling-based methods in terms of their theoretical properties and assess their empirical performance on synthetic and real-world tasks. We find that SVGD using functional and hybrid kernels can overcome the limitations of deep ensembles. It improves on functional diversity and uncertainty estimation and approaches the true Bayesian posterior more closely. Moreover, we show that using stochastic SVGD updates, as opposed to the standard deterministic ones, can further improve the performance.Preprint. Under review.

show abstract

“…In the context of machine learning a number of recent works have proposed gradient flow formulations of methods for sampling and variational inference, see for example [4,40,44,50,86,88].…”

Section: Previous Workmentioning

confidence: 99%

On the geometry of Stein variational gradient descent

Duncan

Nuesken

Szpruch

2019

Preprint

View full text Add to dashboard Cite

Bayesian inference problems require sampling or approximating high-dimensional probability distributions. The focus of this paper is on the recently introduced Stein variational gradient descent methodology, a class of algorithms that rely on iterated steepest descent steps with respect to a reproducing kernel Hilbert space norm. This construction leads to interacting particle systems, the mean-field limit of which is a gradient flow on the space of probability distributions equipped with a certain geometrical structure. We leverage this viewpoint to shed some light on the convergence properties of the algorithm, in particular addressing the problem of choosing a suitable positive definite kernel function. Our analysis leads us to considering certain nondifferentiable kernels with adjusted tails. We demonstrate significant performs gains of these in various numerical experiments.Recently there has been interest in particle optimisation techniques which combine aspects of both

show abstract

Function Space Particle Optimization for Bayesian Neural Networks

Cited by 6 publications

References 18 publications

NOMU: Neural Optimization-based Model Uncertainty

NOMU: Neural Optimization-based Model Uncertainty

On Stein Variational Neural Network Ensembles

On the geometry of Stein variational gradient descent

Contact Info

Product

Resources

About