2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2) 2018
DOI: 10.1109/emc2.2018.00011
|View full text |Cite
|
Sign up to set email alerts
|

A Quantization-Friendly Separable Convolution for MobileNets

Abstract: As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1 [1], while it successfully reduc… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
68
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 101 publications
(73 citation statements)
references
References 5 publications
5
68
0
Order By: Relevance
“…In addition we also compare to multiple higher level approaches, namely quantization aware training [16] as well as stochastic rounding and dynamic ranges [9,10], which are both level 3 approaches. We also compare to two level 4 approaches based on relaxed quantization [21], which involve training a model from scratch and to quantization friendly separable convolutions [31] that require a rework of the original MobileNet architecture. The results are summarized in Table 5.…”
Section: Comparison To Other Approachesmentioning
confidence: 99%
“…In addition we also compare to multiple higher level approaches, namely quantization aware training [16] as well as stochastic rounding and dynamic ranges [9,10], which are both level 3 approaches. We also compare to two level 4 approaches based on relaxed quantization [21], which involve training a model from scratch and to quantization friendly separable convolutions [31] that require a rework of the original MobileNet architecture. The results are summarized in Table 5.…”
Section: Comparison To Other Approachesmentioning
confidence: 99%
“…DAC has no non-linear layers (batch normalization layers and activation layers) between the depthwise and the pointwise layers. The absence of non-linear layers makes DAC quantization friendly and hence suitable for further hardware acceleration, which Sheng et al [22] have already experimentally verified.…”
Section: Convolutional Layer Factorizationmentioning
confidence: 97%
“…MobileNet achieves 70.6% accuracy on the ImageNet dataset, thus outperforming the larger AlexNet, SqueezeNet and Inception-V1 models. It can be optimized further for mobile usage by quantization [60,61] -converting its weights and activations from FLOAT32 to INT8 8-bit fixed point representation. Though this leads to an accuracy drop to 69.7%, the speed is simultaneously more than doubled and the size is reduced (by a factor of 4) to 4.3MB.…”
Section: Deep Learning Testsmentioning
confidence: 99%