Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical Guarantees and Implementation Details

Jantre, Sanket; Bhattacharya, Shrijita; Maiti, Tapabrata

doi:10.48550/arxiv.2108.11000

Cited by 1 publication

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Baselines. Our baselines include the frequentist model of a deterministic deep neural network (trained with SGD), BNN [39], spike-and-slab BNN for node sparsity [30], single forward pass ensemble models including rank-1 BNN Gaussian ensemble [10], MIMO [11], and EDST ensemble [12], multiple forward pass ensemble methods: DST ensemble [12] and Dense ensemble of deterministic neural networks. For fair comparison, we keep the training hardware, environment, data augmentation, and training schedules of all the models same.…”

Section: Experiments: Results and Analysismentioning

confidence: 99%

“…Dynamic sparsity learning for our sequential ensemble of sparse BNNs is achieved via spike-and-slab prior: a Dirac spike (δ 0 ) at 0 and a uniform slab distribution elsewhere [40]. We adopt the sparse BNN model of [30] to achieve the structural sparsity in Bayesian neural networks. Specifically a common indicator variable z is used for all the weights incident on a node which helps to prune away the given node while training.…”

Section: Related Workmentioning

confidence: 99%

“…The Evidence Lower Bound (ELBO) is represented as L = −E q(θ) [log p(D|θ)] + d KL (q(θ), p(θ)), where L in the case of SSBNN [30], which are base learners in our SeBayS ensembles, consists of KL of discrete variables leading to a non-differentiable loss and thus creates a challenge in practical implementation. Instead, [30] apply continuous relaxation, i.e., to approximate discrete random variables by a continuous distribution. Specifically, the continuous relaxation approximation is achieved through Gumbel-softmax (GS) distribution [49], [50], that is q(z lj ) ∼ Bernoulli(γ lj ) is approximated by q(z lj ) ∼ GS(γ lj , τ ), where…”

Section: A Spike-and-slab Implementationmentioning

confidence: 99%

“…where τ is the temperature. We fix the τ = 0.5 for all the experiments, similar to [30]. zlj is used in the backward pass for easier gradient calculation, z lj is used for the exact sparsity in the forward pass.…”

Section: A Spike-and-slab Implementationmentioning

confidence: 99%

“…First, the automatic data-driven sparsity learning in Bayesian neural networks is achieved using sparsity-inducing priors [27]. Second, the use of group sparsity priors [28][29][30] provides structural sparsity in Bayesian neural networks leading to significant computational gains. We leverage the automated structural sparsity learning using spike-and-slab priors similar to [30] in our approach to sequentially generate multiple Bayesian neural subnetworks with varying sparse connectivities which when combined yields highly diverse ensemble.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Sequential Bayesian Neural Subnetwork Ensembles

Jantre¹,

Madireddy²,

Bhattacharya³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Deep neural network ensembles that appeal to model diversity have been used successfully to improve predictive performance and model robustness in several applications. Whereas, it has recently been shown that sparse subnetworks of dense models can match the performance of their dense counterparts and increase their robustness while effectively decreasing the model complexity. However, most ensembling techniques require multiple parallel and costly evaluations and have been proposed primarily with deterministic models, whereas sparsity induction has been mostly done through ad-hoc pruning. We propose sequential ensembling of dynamic Bayesian neural subnetworks that systematically reduce model complexity through sparsity-inducing priors and generate diverse ensembles in a single forward pass of the model. The ensembling strategy consists of an exploration phase that finds high-performing regions of the parameter space and multiple exploitation phases that effectively exploit the compactness of the sparse model to quickly converge to different minima in the energy landscape corresponding to high-performing subnetworks yielding diverse ensembles. We empirically demonstrate that our proposed approach surpasses the baselines of the dense frequentist and Bayesian ensemble models in prediction accuracy, uncertainty estimation, and out-of-distribution (OoD) robustness on CIFAR10, CIFAR100 datasets, and their out-of-distribution variants: CIFAR10-C, CIFAR100-C induced by corruptions. Furthermore, we found that our approach produced the most diverse ensembles compared to the approaches with a single forward pass and even compared to the approaches with multiple forward passes in some cases.Preprint. Under review.

show abstract

Section: Experiments: Results and Analysismentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%