2020
DOI: 10.48550/arxiv.2007.09952
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs

Abstract: Recent work in network quantization produced state-of-the-art results using mixed precision quantization. An imperative requirement for many efficient edge device hardware implementations is that their quantizers are uniform and with power-of-two thresholds. In this work, we introduce the Hardware Friendly Mixed Precision Quantization Block (HMQ) in order to meet this requirement. The HMQ is a mixed precision quantization block that repurposes the Gumbel-Softmax estimator into a smooth estimator of a pair of q… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(14 citation statements)
references
References 34 publications
0
14
0
Order By: Relevance
“…In [37], the weights and activations are quantized separately in a two-step strategy. Mixed-precision is widely employed to achieve smaller quantization errors, such as LQ-Net [43], DJPQ [38] and HMQ [11]. In HAQ [36], the training policy is learned by reinforcement learning.…”
Section: Quantization Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In [37], the weights and activations are quantized separately in a two-step strategy. Mixed-precision is widely employed to achieve smaller quantization errors, such as LQ-Net [43], DJPQ [38] and HMQ [11]. In HAQ [36], the training policy is learned by reinforcement learning.…”
Section: Quantization Methodsmentioning
confidence: 99%
“…Previous approaches [43,17,11,38,8] generally model the task of quantization as an error minimization problem, viz. min W − Q(W) , where W is the weight and Q(•) the quantizer.…”
Section: Introductionmentioning
confidence: 99%
“…In this section, we compare our GMPQ with the stateof-the-art fixed-precision models containing APoT [25] and RQ [31] and mixed-precision networks including ALQ [38], HAWQ [9], EdMIPS [3], HAQ [50], BP-NAS [56], HMQ [13] and DQ [47] on ImageNet for image classification and on PASCAL VOC for object detection. We also provide the performance of full-precision models for reference.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…Yang et al [55] decoupled the constrained optimization via Alternating Direction Method of Multipliers (ADMM), and Wang et al [53] utilized the variational information bottleneck to search for the proper bitwidth and pruning ratio. Habi et al [13] and Van et al [48] directly optimized the quantization intervals for bitwidth selection of mixed-precision networks. However, differentiable search for mixed-precision quantization still needs a large amount of time due to the optimization of the large hypernet.…”
Section: Mixed-precision Quantizationmentioning
confidence: 99%
“…3, we observe that depthwise convolution has larger bitwidth than the regular convolution. As found in (Jain et al, 2019), the depthwise convolution with irregular weight distributions is the main reason that makes quan- to be superior to their fixed bitwidth counterparts (Wang et al, 2019;Uhlich et al, 2020;Cai & Vasconcelos, 2020;Habi et al, 2020). DDQ is naturally used to perform mixprecision training by a binary block-diagonal matrix U .…”
Section: Evaluation On Imagenetmentioning
confidence: 99%