A Quantization-Friendly Separable Convolution for MobileNets

Sheng, Tao; Feng, Chen; Zhuo, Shaojie; Zhang, Xiaopeng; Shen, Liang; Aleksic, Milivoje

doi:10.1109/emc2.2018.00011

Cited by 101 publications

(73 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition we also compare to multiple higher level approaches, namely quantization aware training [16] as well as stochastic rounding and dynamic ranges [9,10], which are both level 3 approaches. We also compare to two level 4 approaches based on relaxed quantization [21], which involve training a model from scratch and to quantization friendly separable convolutions [31] that require a rework of the original MobileNet architecture. The results are summarized in Table 5.…”

Section: Comparison To Other Approachesmentioning

confidence: 99%

Data-Free Quantization Through Weight Equalization and Bias Correction

Nagel¹,

Baalen²,

Blankevoort³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

390

331

View full text Add to dashboard Cite

We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference in modern deep learning hardware architectures. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied ubiquitously to almost any model with a straight-forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.

show abstract

Section: Comparison To Other Approachesmentioning

confidence: 99%

Data-Free Quantization Through Weight Equalization and Bias Correction

Nagel¹,

Baalen²,

Blankevoort³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

390

331

View full text Add to dashboard Cite

show abstract

“…DAC has no non-linear layers (batch normalization layers and activation layers) between the depthwise and the pointwise layers. The absence of non-linear layers makes DAC quantization friendly and hence suitable for further hardware acceleration, which Sheng et al [22] have already experimentally verified.…”

Section: Convolutional Layer Factorizationmentioning

confidence: 97%

DAC: Data-Free Automatic Acceleration of Convolutional Networks

Xin

Zhang

Jiang

et al. 2019

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Deploying a deep learning model on mobile/IoT devices is a challenging task. The difficulty lies in the trade-off between computation speed and accuracy. A complex deep learning model with high accuracy runs slowly on resourcelimited devices, while a light-weight model that runs much faster loses accuracy. In this paper, we propose a novel decomposition method, namely DAC, that is capable of factorizing an ordinary convolutional layer into two layers with much fewer parameters. DAC computes the corresponding weights for the newly generated layers directly from the weights of the original convolutional layer. Thus, no training (or fine-tuning) or any data is needed. The experimental results show that DAC reduces a large number of floating-point operations (FLOPs) while maintaining high accuracy of a pre-trained model. If 2% accuracy drop is acceptable, DAC saves 53% FLOPs of VGG16 image classification model on ImageNet dataset, 29% FLOPS of SSD300 object detection model on PASCAL VOC2007 dataset, and 46% FLOPS of a multi-person pose estimation model on Microsoft COCO dataset. Compared to other existing decomposition methods, DAC achieves better performance.

show abstract

“…MobileNet achieves 70.6% accuracy on the ImageNet dataset, thus outperforming the larger AlexNet, SqueezeNet and Inception-V1 models. It can be optimized further for mobile usage by quantization [60,61] -converting its weights and activations from FLOAT32 to INT8 8-bit fixed point representation. Though this leads to an accuracy drop to 69.7%, the speed is simultaneously more than doubled and the size is reduced (by a factor of 4) to 4.3MB.…”

Section: Deep Learning Testsmentioning

confidence: 99%

AI Benchmark: Running Deep Neural Networks on Android Smartphones

Ignatov

Timofte

Chou

et al. 2019

Lecture Notes in Computer Science

239

190

View full text Add to dashboard Cite

Over the last years, the computational power of mobile devices such as smartphones and tablets has grown dramatically, reaching the level of desktop computers available not long ago. While standard smartphone apps are no longer a problem for them, there is still a group of tasks that can easily challenge even high-end devices, namely running artificial intelligence algorithms. In this paper, we present a study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones. We give an overview of the hardware acceleration resources available on four main mobile chipset platforms: Qualcomm, HiSilicon, MediaTek and Samsung. Additionally, we present the realworld performance results of different mobile SoCs collected with AI Benchmark 1 that are covering all main existing hardware configurations. * We also thank Przemyslaw Szczepaniak (pszczepaniak@google.com), Google Inc., for writing and editing sections 2.7, 3.1 and 3.2. 1

show abstract

A Quantization-Friendly Separable Convolution for MobileNets

Cited by 101 publications

References 5 publications

Data-Free Quantization Through Weight Equalization and Bias Correction

Data-Free Quantization Through Weight Equalization and Bias Correction

DAC: Data-Free Automatic Acceleration of Convolutional Networks

AI Benchmark: Running Deep Neural Networks on Android Smartphones

Contact Info

Product

Resources

About