Low-Power Computer Vision 2022
DOI: 10.1201/9781003162810-6
|View full text |Cite
|
Sign up to set email alerts
|

Fast Adjustable Threshold for Uniform Neural Network Quantization

Abstract: The neural network quantization is highly desired procedure to perform before running neural networks on mobile devices. Quantization without fine-tuning leads to accuracy drop of the model, whereas commonly used training with quantization is done on the full set of the labeled data and therefore is both time-and resource-consuming. Real life applications require simplification and acceleration of quantization procedure that will maintain accuracy of full-precision neural network, especially for modern mobile … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 5 publications
0
11
0
Order By: Relevance
“…This paper focuses on post-training quantization. Previous works on post-training quantization [7,12,3,17] focus on building generic quantized models. HAGO introduces a novel pipeline that generates optimized quantized models based on the backend specifications.…”
Section: Related Workmentioning
confidence: 99%
“…This paper focuses on post-training quantization. Previous works on post-training quantization [7,12,3,17] focus on building generic quantized models. HAGO introduces a novel pipeline that generates optimized quantized models based on the backend specifications.…”
Section: Related Workmentioning
confidence: 99%
“…Although the asymmetric design offers superior properties compared to the symmetric design [6,14], it comes with an additional cost at inference time when both weights and activations use this scheme. Equation (21) shows the computations such a multiply-accumulate engine needs to make. Note that the zero-point for the bias is usually omitted [13]:…”
Section: Asymmetric Quantizationmentioning
confidence: 99%
“…Quantization: DNN Quantization [23] is usually motivated by faster DNN inference, e.g., through fixed-point quantization and arithmetic [2], [24], [25], and energy savings. To avoid reduced accuracy, quantization is considered during training [26], [27] instead of post-training or with finetuning [28], [29], [30], [31], enabling low-bit quantization such as binary DNNs [32], [33]. Some works also consider quantizing activations [32], [34], [35] or gradients [36], [37], [38].…”
Section: Related Workmentioning
confidence: 99%