2019
DOI: 10.48550/arxiv.1901.09504
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

Abstract: Quantization can improve the execution latency and energy efficiency of neural networks on both commodity GPUs and specialized accelerators. The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floatingpoint model without (re)training. DNN weights and activations follow a bell-shaped distribution post-training, while practical hardware uses a linear quantization grid. This leads to challenges in dealing with outliers in the dist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
28
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 23 publications
(29 citation statements)
references
References 23 publications
1
28
0
Order By: Relevance
“…We found no advantage in doing any kind of weight clipping. This is in line with earlier works that also report no advantage to weight clipping for larger bitwidths (Migacz, 2017;Zhao et al, 2019). Therefore, ACIQ was considered for quantizing activations only.…”
Section: Applicabilitysupporting
confidence: 84%
See 2 more Smart Citations
“…We found no advantage in doing any kind of weight clipping. This is in line with earlier works that also report no advantage to weight clipping for larger bitwidths (Migacz, 2017;Zhao et al, 2019). Therefore, ACIQ was considered for quantizing activations only.…”
Section: Applicabilitysupporting
confidence: 84%
“…Meller et al (2019) suggests weight factorization that arranges the network to be more tolerant of quantization by equalizing channels and removing outliers. A similar approach has recently been suggested by (Zhao et al, 2019), who suggests duplicating channels containing outliers and halving their values to move outliers toward the center of the distribution without changing network functionality. Unlike our method that focuses on 4-bit quantization, the focus of these schemes was post-training quantization for larger bitwidths.…”
Section: Previous Workmentioning
confidence: 97%
See 1 more Smart Citation
“…However, to compensate for the accuracy loss, this method relies on a run-time per-channel quantization scheme for activations which is inefficient and not hardware friendly. In similar lines, the OCS method (Zhao et al, 2019) proposes to eliminate the outliers for better accuracy with minimal overhead. Though these methods considerably reduce the time taken for quantization, they are unfortunately tightly coupled with training data for quantization.…”
Section: Post Training Quantization Based Methodsmentioning
confidence: 99%
“…Benefits of compression include faster training, faster inference, and less resources required to design more energy-efficient applications. Post-training compression techniques such as pruning (removing less important filters) and quantization (using lower-precision representations for weights) have been proposed [6,21,33,37,38]. Pre-training compression approaches focus on designing smaller networks to begin with [8,9].…”
Section: Introductionmentioning
confidence: 99%