Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

Zhao, Ritchie; Hu, Yuwei; Dotzel, Jordan; De, Christopher; Zhang, Zhiru

doi:10.48550/arxiv.1901.09504

Cited by 23 publications

(29 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We found no advantage in doing any kind of weight clipping. This is in line with earlier works that also report no advantage to weight clipping for larger bitwidths (Migacz, 2017;Zhao et al, 2019). Therefore, ACIQ was considered for quantizing activations only.…”

Section: Applicabilitysupporting

confidence: 84%

“…Meller et al (2019) suggests weight factorization that arranges the network to be more tolerant of quantization by equalizing channels and removing outliers. A similar approach has recently been suggested by (Zhao et al, 2019), who suggests duplicating channels containing outliers and halving their values to move outliers toward the center of the distribution without changing network functionality. Unlike our method that focuses on 4-bit quantization, the focus of these schemes was post-training quantization for larger bitwidths.…”

Section: Previous Workmentioning

confidence: 97%

“…In addition, we show in the Appendix that our analytical clipping approach outperforms KLD in almost all models for 4-bit quantization even when it uses only statistical information (i.e., not tensor values observed at runtime). Zhao et al (2019) compared ACIQ (from an earlier version of this manuscript) to KLD for higher bit-width of 5 to 8 bits. It was found that ACIQ typically outperforms KLD for weight clipping and is more or less the same for activation clipping.…”

Section: Previous Workmentioning

confidence: 99%

See 2 more Smart Citations

Post-training 4-bit quantization of convolution networks for rapid-deployment

Banner,

Nahshan,

Hoffer

et al. 2018

Preprint

View full text Add to dashboard Cite

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of intermediate results, but it often requires the full datasets and time-consuming fine tuning to recover the accuracy lost after quantization. This paper introduces the first practical 4-bit post training quantization approach: it does not involve training the quantized model (fine-tuning), nor it requires the availability of the full dataset. We target the quantization of both activations and weights and suggest three complementary methods for minimizing quantization error at the tensor level, two of whom obtain a closed-form analytical solution. Combining these methods, our approach achieves accuracy that is just a few percents less the state-of-the-art baseline across a wide range of convolutional models. The source code to replicate all experiments is available on GitHub: https://github.com/submission2019/cnn-quantization.

show abstract

Section: Applicabilitysupporting

confidence: 84%

Section: Previous Workmentioning

confidence: 97%

Section: Previous Workmentioning

confidence: 99%

See 1 more Smart Citation

Post-training 4-bit quantization of convolution networks for rapid-deployment

Banner,

Nahshan,

Hoffer

et al. 2018

Preprint

View full text Add to dashboard Cite

show abstract

“…However, to compensate for the accuracy loss, this method relies on a run-time per-channel quantization scheme for activations which is inefficient and not hardware friendly. In similar lines, the OCS method (Zhao et al, 2019) proposes to eliminate the outliers for better accuracy with minimal overhead. Though these methods considerably reduce the time taken for quantization, they are unfortunately tightly coupled with training data for quantization.…”

Section: Post Training Quantization Based Methodsmentioning

confidence: 99%

Hybrid and Non-Uniform quantization methods using retro synthesis data for efficient inference

GVSL¹,

Kumar²

2020

Preprint

View full text Add to dashboard Cite

Existing quantization aware training methods attempt to compensate for the quantization loss by leveraging on training data, like most of the post-training quantization methods, and are also time consuming. Both these methods are not effective for privacy constraint applications as they are tightly coupled with training data. In contrast, this paper proposes a data-independent post-training quantization scheme that eliminates the need for training data. This is achieved by generating a faux dataset, hereafter referred to as 'Retro-Synthesis Data', from the FP32 model layer statistics and further using it for quantization. This approach outperformed state-of-the-art methods including, but not limited to, ZeroQ and DFQ on models with and without Batch-Normalization layers for 8, 6, and 4 bit precisions on ImageNet and CIFAR-10 datasets. We also introduced two futuristic variants of post-training quantization methods namely 'Hybrid Quantization' and 'Non-Uniform Quantization'. The Hybrid Quantization scheme determines the sensitivity of each layer for per-tensor & per-channel quantization, and thereby generates hybrid quantized models that are '10 to 20%' efficient in inference time while achieving the same or better accuracy compared to per-channel quantization. Also, this method outperformed FP32 accuracy when applied for ResNet-18, and ResNet-50 models on the ImageNet dataset. In the proposed Non-Uniform Quantization scheme, the weights are grouped into different clusters and these clusters are assigned with a varied number of quantization steps depending on the number of weights and their ranges in the respective cluster. This method resulted in '1%' accuracy improvement against state-of-the-art methods on the ImageNet dataset.

show abstract

“…Benefits of compression include faster training, faster inference, and less resources required to design more energy-efficient applications. Post-training compression techniques such as pruning (removing less important filters) and quantization (using lower-precision representations for weights) have been proposed [6,21,33,37,38]. Pre-training compression approaches focus on designing smaller networks to begin with [8,9].…”

Section: Introductionmentioning

confidence: 99%

CC-NET: Image Complexity Guided Network Compression for Biomedical Image Segmentation

Mishra

Liang

Czajka

et al. 2019

2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) for biomedical image analysis are often of very large size, resulting in high memory requirement and high latency of operations. Searching for an acceptable compressed representation of the base CNN for a specific imaging application typically involves a series of time-consuming training/validation experiments to achieve a good compromise between network size and accuracy. To address this challenge, we propose CC-Net, a new image complexity-guided CNN compression scheme for biomedical image segmentation. Given a CNN model, CC-Net predicts the final accuracy of networks of different sizes based on the average image complexity computed from the training data. It then selects a multiplicative factor for producing a desired network with acceptable network accuracy and size. Experiments show that CC-Net is effective for generating compressed segmentation networks, retaining up to ≈ 95% of the base network segmentation accuracy and utilizing only ≈ 0.1% of trainable parameters of the full-sized networks in the best case.

show abstract

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

Cited by 23 publications

References 23 publications

Post-training 4-bit quantization of convolution networks for rapid-deployment

Post-training 4-bit quantization of convolution networks for rapid-deployment

Hybrid and Non-Uniform quantization methods using retro synthesis data for efficient inference

CC-NET: Image Complexity Guided Network Compression for Biomedical Image Segmentation

Contact Info

Product

Resources

About