Automatic Pruning for Quantized Neural Networks

Guerra, Luis María López; Zhuang, Bohan; Reid, Ian R.; Drummond, Tom

doi:10.48550/arxiv.2002.00523

Cited by 6 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MultiMNIST: Our model shows a comparable or even better accuracy with a substantial decrease in the number of parameters. ResNet_XnIDR achieves the best Top-2 accuracy, 99.37%, on MultiMNIST [34] Aff-CapsNets [12] 76.28 CapsNetSIFT [25] 91.27 HGCNet-91 [40] 94.47 Ternary connect + Quantized backprop [24] 87.99 Greedy Algorithm for Quantizing [28] 88.88 SLB on ResNet20 [43] 92.1 SLB on VGG small [43] 94.1 DoReFa-Net on VGG-11 [13] 86.30 DoReFa-Net on ResNet14 [13] 89.84…”

Section: Experiments Resultsmentioning

confidence: 99%

“…Touvron et al presented a weight searching algorithm to search for discrete weights and avoid gradient estimation and non-differentiable problems to improve the accuracy during training the quantized deep neural network [43]. Wang et al proposed a pruning algorithm to point out unnecessary low-precision filters and utilize Bayesian optimization to decide the pruning ratio [13]. These papers are very good, but less revolutionary than [11], [16], and [24].…”

Section: Xnor Networkmentioning

confidence: 99%

See 1 more Smart Citation

XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Sun¹,

Fard²,

Mahoor³

2021

Preprint

View full text Add to dashboard Cite

Although Capsule Networks show great abilities in defining the position relationship between features in deep neural networks for visual recognition tasks, they are computationally expensive and not suitable for running on mobile devices. The bottleneck is in the computational complexity of the Dynamic Routing mechanism used between capsules. On the other hand, neural networks such as XNOR-Net are fast and computationally efficient but have relatively low accuracy because of their information loss in the binarization process. This paper proposes a new class of Fully Connected (FC) Layers by xnorizing the linear projector outside or inside the Dynamic Routing within CapsFC layer. Specifically, our proposed FC layers has two versions, XnODR (Xnorizing Linear Projector Outside Dynamic Routing) and XnIDR (Xnorizing Linear Projector Inside Dynamic Routing). To test their generalization, we insert them into MobileNet V2 and ResNet-50 separately. Experiments on three datasets, MNIST, CIFAR-10, MultiMNIST validate their effectiveness. Our experimental results demonstrate that both XnODR and XnIDR help networks to have high accuracy with lower FLOPs and fewer parameters (e.g., 95.32% accuracy with 2.99M parameters and 311.22M FLOPs on CIFAR-10).

show abstract

Section: Experiments Resultsmentioning

confidence: 99%

Section: Xnor Networkmentioning

confidence: 99%

XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Sun¹,

Fard²,

Mahoor³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…And Ref. [28] focuses on the gradient part, which scales the gradient according to the position of the weight vector, making it easier to compress. Reference [29] explores how to automatically do pruning when quantization.…”

Section: Pruning and Quantization Mixed Compression Methodsmentioning

confidence: 99%

A pruning-then-quantization model compression framework for facial emotion recognition

Sun,

Shao,

et al. 2023

Intell. and Converged Netw.

View full text Add to dashboard Cite

Facial emotion recognition achieves great success with the help of large neural models but also fails to be applied in practical situations due to the large model size of neural methods. To bridge this gap, in this paper, we combine two mainstream model compression methods (pruning and quantization) together, and propose a pruningthen-quantization framework to compress the neural models for facial emotion recognition tasks. Experiments on three datasets show that our model could achieve a high model compression ratio and maintain the model's high performance well. Besides, We analyze the layer-wise compression performance of our proposed framework to explore its effect and adaptability in fine-grained modules.

show abstract

“…However, there is a trade-off between accuracy and pruning; accuracy may decrease when the pruning rates increase. In [111], the authors utilize Bayesian optimization for channel pruning for quantized neural networks. That pruning approach based on the angle preservation feature of high dimensional binary vectors [112] and the euclidean distance.…”

Section: B: Pruningmentioning

confidence: 99%

A Systematic Literature Review on Binary Neural Networks

et al. 2023

View full text Add to dashboard Cite

This paper presents an extensive literature review on Binary Neural Network (BNN). BNN utilizes binary weights and activation function parameters to substitute the full-precision values. In digital implementations, BNN replaces the complex calculations of Convolutional Neural Networks (CNNs) with simple bitwise operations. BNN optimizes large computation and memory storage requirements, which leads to less area and power consumption compared to full-precision models. Although there are many advantages of BNN, the binarization process has a significant impact on the performance and accuracy of the generated models. To reflect the state-of-the-art in BNN and explore how to develop and improve BNNbased models, we conduct a systematic literature review on BNN with data extracted from 239 research studies. Our review discusses various BNN architectures and the optimization approaches developed to improve their performance. There are three main research directions in BNN: accuracy optimization, compression optimization, and acceleration optimization. The accuracy optimization approaches include quantization error reduction, special regularization, gradient error minimization, and network structure. The compression optimization approaches combine fractional BNN and pruning. The acceleration optimization approaches comprise computing in-memory, FPGA-based implementations, and ASIC-based implementations. At the end of our review, we present a comprehensive analysis of BNN applications and their evaluation metrics. Also, we shed some light on the most common BNN challenges and the future research trends of BNN.

show abstract

Automatic Pruning for Quantized Neural Networks

Cited by 6 publications

References 23 publications

XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

A pruning-then-quantization model compression framework for facial emotion recognition

A Systematic Literature Review on Binary Neural Networks

Contact Info

Product

Resources

About