Scalable Model Compression by Entropy Penalized Reparameterization

Oktay, Deniz; Ballé, Johannes; Singh, Rajesh; Shrivastava, Abhinav

doi:10.48550/arxiv.1906.06624

Cited by 7 publications

(9 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 shows that we outperform the digital storage even when weights are quantized to 8 bits since the accuracy is preserved with 1 cell per weight on CIFAR-10 (with 0.1% loss) and with 3 cells per weight on ImageNet (with 3.6% loss). We do not compare our results with more aggressive quantization techniques (Khoram & Li, 2018;Banner et al, 2018;Oktay et al, 2019;Wiedemann et al, 2020;Choi et al, 2020;Fan et al, 2020) that can achieve higher efficiency in digital storage since they also bring a huge complexity with multiple retraining stages.…”

Section: Sparsity and Sensitivity Driven Protectionmentioning

confidence: 93%

“…Although probabilistic approaches to model compression have been explored in (Louizos et al, 2017;Reagan et al, 2018;Havasi et al, 2019), we additionally consider the physical constraints of the storage device used for memory as part of our learning scheme. Our end-to-end approach is most similar to (Oktay et al, 2019), in that they also learn a decoder to map NN weights into a latent space prior to quantization. However, our method is different in that our autoencoder also learns the appropriate compression scheme (with an encoder), we inject noise into our compressed weights rather than quantizing, and we do not require training a NN from scratch.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Neural Network Compression for Noisy Storage Devices

Berivan¹,

Choi²,

Zheng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Compression and efficient storage of neural network (NN) parameters is critical for applications that run on resource-constrained devices. Although NN model compression has made significant progress, there has been considerably less investigation in the actual physical storage of NN parameters. Conventionally, model compression and physical storage are decoupled, as digital storage media with error correcting codes (ECCs) provide robust error-free storage. This decoupled approach is inefficient, as it forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit. We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimizes model compression and physical storage to maximize memory utility. We investigate the challenges of analog storage by studying model storage on phase change memory (PCM) arrays and develop a variety of robust coding strategies for NN model storage. We demonstrate the efficacy of our approach on MNIST, CIFAR-10 and ImageNet datasets for both existing and novel compression methods. Compared to conventional error-free digital storage, our method has the potential to reduce the memory size by one order of magnitude, without significantly compromising the stored model's accuracy.

show abstract

Section: Sparsity and Sensitivity Driven Protectionmentioning

confidence: 93%

Section: Related Workmentioning

confidence: 99%

Neural Network Compression for Noisy Storage Devices

Berivan¹,

Choi²,

Zheng³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Towards this end, many approaches have been proposed: context-adaptive binary arithmetic coding [12], learning the quantized parameters using the local reparametrization trick [22], cluster similar parameters between different layers [11], using matrix factorization followed by Tucker decomposition [10], training adversarial neural networks towards compression [23] or employing a Huffman encoding scheme [24] are just some of them. Recently, Oktay et al proposed an entropy penalized reparametrizations to the parameters of a deep model, which leads to competitive compression values sacrificing a little bit the deep model's performance [13]. However, their approach has some training overhead, like the fact they require to train a decoder, they make their formulation differentiable through the use of straight-through estimators (STE).…”

Section: Minimizing the Stored Model's Memorymentioning

confidence: 99%

“…A third approach consists in quantizing the network parameters [10,11,12], possibly followed by entropy-coding the quantized parameters. Similar approaches achieve promising results, however most quantization schemes just aim at learning a compressible representation of the parameters [12,13,8] rather than properly minimizing the compressed parameters entropy. Indeed, the entropy of the quantized parameters is not differentiable and cannot be easily minimized in standard gradient descent-based frameworks.…”

Section: Introductionmentioning

confidence: 99%

HEMP: High-order Entropy Minimization for neural network comPression

Tartaglione,

Lathuilière,

Fiandrotti

et al. 2021

Preprint

View full text Add to dashboard Cite

We formulate the entropy of a quantized artificial neural network as a differentiable function that can be plugged as a regularization term into the cost function minimized by gradient descent. Our formulation scales efficiently beyond the first order and is agnostic of the quantization scheme. The network can then be trained to minimize the entropy of the quantized parameters, so that they can be optimally compressed via entropy coding. We experiment with our entropy formulation at quantizing and compressing well-known network architectures over multiple datasets. Our approach compares favorably over similar methods, enjoying the benefits of higher order entropy estimate, showing flexibility towards non-uniform quantization (we use Lloyd-max quantization), scalability towards any entropy order to be minimized and efficiency in terms of compression. We show that HEMP is able to work in synergy with other approaches aiming at pruning or quantizing the model itself, delivering significant benefits in terms of storage size compressibility without harming the model's performance.

show abstract

“…An alternative approach to learning under resource constraints is focused on reducing the overall model complexity, e.g., by bounding the model size [31,39], pruning [22,3,17,59], or restricting weights to be binary [16]. Adaptation of these methods to FL and their analysis in such contexts are open research questions.…”

Section: Communication Reduction Strategiesmentioning

confidence: 99%

Communication-Efficient Federated Learning via Optimal Client Sampling

Ribero¹,

Vikalo²

2020

Preprint

View full text Add to dashboard Cite

Federated learning is a private and efficient framework for learning models in settings where data is distributed across many clients. Due to interactive nature of the training process, frequent communication of large amounts of information is required between the clients and the central server which aggregates local models. We propose a novel, simple and efficient way of updating the central model in communication-constrained settings by determining the optimal client sampling policy. In particular, modeling the progression of clients' weights by an Ornstein-Uhlenbeck process allows us to derive the optimal sampling strategy for selecting a subset of clients with significant weight updates. The central server then collects local models from only the selected clients and subsequently aggregates them. We propose four client sampling strategies and test them on two federated learning benchmark tests, namely, a classification task on EMNIST and a realistic language modeling task using the Stackoverflow dataset. The results show that the proposed framework provides significant reduction in communication while maintaining competitive or achieving superior performance compared to baseline. Our methods introduce a new line of communication strategies orthogonal to the existing user-local methods such as quantization or sparsification, thus complementing rather than aiming to replace them.

show abstract

Scalable Model Compression by Entropy Penalized Reparameterization

Cited by 7 publications

References 22 publications

Neural Network Compression for Noisy Storage Devices

Neural Network Compression for Noisy Storage Devices

HEMP: High-order Entropy Minimization for neural network comPression

Communication-Efficient Federated Learning via Optimal Client Sampling

Contact Info

Product

Resources

About