2019
DOI: 10.48550/arxiv.1906.06624
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scalable Model Compression by Entropy Penalized Reparameterization

Abstract: We describe an end-to-end neural network weight compression approach that draws inspiration from recent latent-variable data compression methods. The network parameters (weights and biases) are represented in a "latent" space, amounting to a reparameterization. This space is equipped with a learned probability model, which is used to impose an entropy penalty on the parameter representation during training, and to compress the representation using arithmetic coding after training. We are thus maximizing accura… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…Table 1 shows that we outperform the digital storage even when weights are quantized to 8 bits since the accuracy is preserved with 1 cell per weight on CIFAR-10 (with 0.1% loss) and with 3 cells per weight on ImageNet (with 3.6% loss). We do not compare our results with more aggressive quantization techniques (Khoram & Li, 2018;Banner et al, 2018;Oktay et al, 2019;Wiedemann et al, 2020;Choi et al, 2020;Fan et al, 2020) that can achieve higher efficiency in digital storage since they also bring a huge complexity with multiple retraining stages.…”
Section: Sparsity and Sensitivity Driven Protectionmentioning
confidence: 93%
See 1 more Smart Citation
“…Table 1 shows that we outperform the digital storage even when weights are quantized to 8 bits since the accuracy is preserved with 1 cell per weight on CIFAR-10 (with 0.1% loss) and with 3 cells per weight on ImageNet (with 3.6% loss). We do not compare our results with more aggressive quantization techniques (Khoram & Li, 2018;Banner et al, 2018;Oktay et al, 2019;Wiedemann et al, 2020;Choi et al, 2020;Fan et al, 2020) that can achieve higher efficiency in digital storage since they also bring a huge complexity with multiple retraining stages.…”
Section: Sparsity and Sensitivity Driven Protectionmentioning
confidence: 93%
“…Although probabilistic approaches to model compression have been explored in (Louizos et al, 2017;Reagan et al, 2018;Havasi et al, 2019), we additionally consider the physical constraints of the storage device used for memory as part of our learning scheme. Our end-to-end approach is most similar to (Oktay et al, 2019), in that they also learn a decoder to map NN weights into a latent space prior to quantization. However, our method is different in that our autoencoder also learns the appropriate compression scheme (with an encoder), we inject noise into our compressed weights rather than quantizing, and we do not require training a NN from scratch.…”
Section: Related Workmentioning
confidence: 99%
“…Towards this end, many approaches have been proposed: context-adaptive binary arithmetic coding [12], learning the quantized parameters using the local reparametrization trick [22], cluster similar parameters between different layers [11], using matrix factorization followed by Tucker decomposition [10], training adversarial neural networks towards compression [23] or employing a Huffman encoding scheme [24] are just some of them. Recently, Oktay et al proposed an entropy penalized reparametrizations to the parameters of a deep model, which leads to competitive compression values sacrificing a little bit the deep model's performance [13]. However, their approach has some training overhead, like the fact they require to train a decoder, they make their formulation differentiable through the use of straight-through estimators (STE).…”
Section: Minimizing the Stored Model's Memorymentioning
confidence: 99%
“…A third approach consists in quantizing the network parameters [10,11,12], possibly followed by entropy-coding the quantized parameters. Similar approaches achieve promising results, however most quantization schemes just aim at learning a compressible representation of the parameters [12,13,8] rather than properly minimizing the compressed parameters entropy. Indeed, the entropy of the quantized parameters is not differentiable and cannot be easily minimized in standard gradient descent-based frameworks.…”
Section: Introductionmentioning
confidence: 99%
“…An alternative approach to learning under resource constraints is focused on reducing the overall model complexity, e.g., by bounding the model size [31,39], pruning [22,3,17,59], or restricting weights to be binary [16]. Adaptation of these methods to FL and their analysis in such contexts are open research questions.…”
Section: Communication Reduction Strategiesmentioning
confidence: 99%