Soft Weight-Sharing for Neural Network Compression

Ullrich, Karen; Meeds, Edward; Welling, Max

doi:10.48550/arxiv.1702.04008

Cited by 58 publications

(77 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LeNet-300-100 DNS [35] 1.99 1.79 L-OBS [11] 1.96 1.5 SWS [7] 1.94 4.3 Sparse VD [8] 1.92 1.47 Ours 1.98 ±0.07 1.51 ± 0.07 LeNet-5 DNS [35] 0.91 0.93 L-OBS [11] 1.66 0.9 SWS [7] 0.97 0.5 Sparse VD [8] 0.75 0.36 Ours 0.97 ± 0.05 0.65 ± 0.02 Table 1: Results for LeNet-300-100 and LeNet-5 trained and pruned on MNIST. For pruning, 1000 random training samples are chosen, α fc = 0.95, α conv = 0.9.…”

Section: Network Methodsmentioning

confidence: 99%

“…Pruning assumes particular relevance for deep neural networks because modern architectures involve several millions of parameters. Existing pruning methods are based on different strategies, e.g Hessian analysis [2,3], magnitudes of weights [4], data-driven approaches [5,6], among others [7,8]. Pruning can be done in one shot [9] or in an iterative way [10], and it is possible to prune connections [2,3,11,12], neurons [13,14,6] or filters for convolutional layers [15,16].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Neural network relief: a pruning algorithm based on neural activity

Dekhovich¹,

Tax²,

Sluiter³

et al. 2021

Preprint

View full text Add to dashboard Cite

Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered -Adam and SGD. The algorithm is not designed to minimze FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.

show abstract

Section: Network Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Neural network relief: a pruning algorithm based on neural activity

Dekhovich¹,

Tax²,

Sluiter³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We adopt the pruning technique for its simplicity, which seeks to induce sparse connections. There are many hybrid pruning methods [10,22,56] that are suitable for model deployment, but they may be overkill for our purpose of searching and designing the architecture after the compression. That being said, compression plays a completely different role in our work, namely it works as a tool for a better understanding of the underlying architecture and makes room for further improvements.…”

Section: Pruning-based Model Compressionmentioning

confidence: 99%

CDFI: Compression-Driven Network Design for Frame Interpolation

Ding¹,

Liang²,

Zhu³

et al. 2021

Preprint

View full text Add to dashboard Cite

DNN-based frame interpolation-that generates the intermediate frames given two consecutive frames-typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e.g., mobile devices. We propose a compression-driven network design for frame interpolation (CDFI), that leverages model pruning through sparsityinducing optimization to significantly reduce the model size while achieving superior performance. Concretely, we first compress the recently proposed AdaCoF model and show that a 10× compressed AdaCoF performs similarly as its original counterpart; then we further improve this compressed model by introducing a multi-resolution warping module, which boosts visual consistencies with multi-level details. As a consequence, we achieve a significant performance gain with only a quarter in size compared with the original AdaCoF. Moreover, our model performs favorably against other state-of-the-arts in a broad range of datasets. Finally, the proposed compression-driven framework is generic and can be easily transferred to other DNNbased frame interpolation algorithm. Our source code is available at https://github.com/tding1/CDFI. * Equal contribution. This work was done when Tianyu Ding was an intern at Applied Sciences Group, Microsoft.† Corresponding author.Recently, a large number of researches have been conducted in this area, especially those based on deep neural networks (DNN) for their promising results in motion esti-

show abstract

“…From a modelling perspective, specifying the functional forms of the prior and posterior distributions is an essential step to perform variational BNNs. One of the most commonly used variational family is fully factorized distribution referred to as the mean-field variational family (Graves, 2011;Blundell et al, 2015;Kingma et al, 2015;Neklyudov et al, 2017;Ullrich et al, 2017;Molchanov et al, 2017)…”

Section: Variational Bayesian Learning For Neural Networkmentioning

confidence: 99%

“…These, however, rely on prior and posterior pairs that are often chosen for convenience in inference, namely, computational tractability. The so-called meanfield variational family in these works assumes the posterior distributions to be all factorizing, and hence neglects the possibility of modelling statistical dependencies (i.e., correlations) among weight parameters (Graves, 2011;Blundell et al, 2015;Kingma et al, 2015;Neklyudov et al, 2017;Ullrich et al, 2017;Molchanov et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Radial and Directional Posteriors for Bayesian Neural Networks

Oh,

Adamczewski,

Park

2019

Preprint

View full text Add to dashboard Cite

We propose a new variational family for Bayesian neural networks. We decompose the variational posterior into two components, where the radial component captures the strength of each neuron in terms of its magnitude; while the directional component captures the statistical dependencies among the weight parameters. The dependencies learned via the directional density provide better modeling performance compared to the widelyused Gaussian mean-field-type variational family. In addition, the strength of input and output neurons learned via the radial density provides a structured way to compress neural networks. Indeed, experiments show that our variational family improves predictive performance and yields compressed networks simultaneously.

show abstract

Soft Weight-Sharing for Neural Network Compression

Cited by 58 publications

References 20 publications

Neural network relief: a pruning algorithm based on neural activity

Neural network relief: a pruning algorithm based on neural activity

CDFI: Compression-Driven Network Design for Frame Interpolation

Radial and Directional Posteriors for Bayesian Neural Networks

Contact Info

Product

Resources

About