2019
DOI: 10.48550/arxiv.1903.08066
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(22 citation statements)
references
References 0 publications
0
16
0
Order By: Relevance
“…In case of the scale s, several studies [1, 9,19,20] have obtained empirical or specific value such as the power-of-2. In this work, we assume the range represented by the original b-bit precision weights is equal to that of the up-scaled (b+1)-bit precision weights.…”
Section: Quantization Methodsmentioning
confidence: 99%
“…In case of the scale s, several studies [1, 9,19,20] have obtained empirical or specific value such as the power-of-2. In this work, we assume the range represented by the original b-bit precision weights is equal to that of the up-scaled (b+1)-bit precision weights.…”
Section: Quantization Methodsmentioning
confidence: 99%
“…However, post-training quantization on these models results in an unacceptably sharp decline in accuracy [21], dropping from 90% or better to 1% or worse on the ImageNet dataset. Accuracy can be reclaimed using various methods, including re-training and quantization-aware training [53,54], but this is not always possible or convenient if the newly required computation requirement or expertise is high, or if the training data are unavailable due to legal or privacy issues. We posit that error-bound lossy compression algorithms may be an alternative, accuracy-preserving method of compressing depth-wise separable models.…”
Section: Quantization Effectiveness On Mobilenetsmentioning
confidence: 99%
“…Many works address these issues using different methods. These include pruning [16,45,47], efficient neural architecture design [14,21,24,38], hardware and CNN co-design [14,20,43] and quantization [6,13,15,23,24,46].…”
Section: Introductionmentioning
confidence: 99%
“…For high compression rates, this is usually achieved by fine-tuning a pre-trained model for quantization. In addition, recent work in quantization focused on making quantizers more hardware friendly (amenable to deployment on embedded devices) by restricting quantization schemes to be: per-tensor, uniform, symmetric and with thresholds that are powers of two [24,41].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation