Fast Adjustable Threshold for Uniform Neural Network Quantization

Goncharenko, Alexander; Denisov, Andrey; Alyamkin, Sergey; Terentev, Evgeny

doi:10.1201/9781003162810-6

Cited by 9 publications

(11 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This paper focuses on post-training quantization. Previous works on post-training quantization [7,12,3,17] focus on building generic quantized models. HAGO introduces a novel pipeline that generates optimized quantized models based on the backend specifications.…”

Section: Related Workmentioning

confidence: 99%

Automated Backend-Aware Post-Training Quantization

Jiang,

Jain,

Liu

et al. 2021

Preprint

View full text Add to dashboard Cite

Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment. However, different hardware backends such as x86 CPU, NVIDIA GPU, ARM CPU, and accelerators may demand different implementations for quantized networks. This diversity calls for specialized post-training quantization pipelines to built for each hardware target, an engineering effort that is often too large for developers to keep up with. We tackle this problem with an automated post-training quantization framework called HAGO. HAGO provides a set of general quantization graph transformations based on a user-defined hardware specification and implements a search mechanism to find the optimal quantization strategy while satisfying hardware constraints for any model. We observe that HAGO achieves speedups of 2.09x, 1.97x, and 2.48x on Intel Xeon Cascade Lake CPUs, NVIDIA Tesla T4 GPUs, ARM Cortex-A CPUs on Raspberry Pi4 relative to full precision respectively, while maintaining the highest reported post-training quantization accuracy in each case.

show abstract

Section: Related Workmentioning

confidence: 99%

Automated Backend-Aware Post-Training Quantization

Jiang,

Jain,

Liu

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Although the asymmetric design offers superior properties compared to the symmetric design [6,14], it comes with an additional cost at inference time when both weights and activations use this scheme. Equation (21) shows the computations such a multiply-accumulate engine needs to make. Note that the zero-point for the bias is usually omitted [13]:…”

Section: Asymmetric Quantizationmentioning

confidence: 99%

Integer-Only CNNs with 4 Bit Weights and Bit-Shift Quantization Scales at Full-Precision Accuracy

2021

View full text Add to dashboard Cite

Quantization of neural networks has been one of the most popular techniques to compress models for embedded (IoT) hardware platforms with highly constrained latency, storage, memory-bandwidth, and energy specifications. Limiting the number of bits per weight and activation has been the main focus in the literature. To avoid major degradation of accuracy, common quantization methods introduce additional scale factors to adapt the quantized values to the diverse data ranges, present in full-precision (floating-point) neural networks. These scales are usually kept in high precision, requiring the target compute engine to support a few high-precision multiplications, which is not desirable due to the larger hardware cost. Little effort has yet been invested in trying to avoid high-precision multipliers altogether, especially in combination with 4 bit weights. This work proposes a new quantization scheme, based on power-of-two quantization scales, that works on-par compared to uniform per-channel quantization with full-precision 32 bit quantization scales when using only 4 bit weights. This is done through the addition of a low-precision lookup-table that translates stored 4 bit weights into nonuniformly distributed 8 bit weights for internal computation. All our quantized ImageNet CNNs achieved or even exceeded the Top-1 accuracy of their full-precision counterparts, with ResNet18 exceeding its full-precision model by 0.35%. Our MobileNetV2 model achieved state-of-the-art performance with only a slight drop in accuracy of 0.51%.

show abstract

“…Quantization: DNN Quantization [23] is usually motivated by faster DNN inference, e.g., through fixed-point quantization and arithmetic [2], [24], [25], and energy savings. To avoid reduced accuracy, quantization is considered during training [26], [27] instead of post-training or with finetuning [28], [29], [30], [31], enabling low-bit quantization such as binary DNNs [32], [33]. Some works also consider quantizing activations [32], [34], [35] or gradients [36], [37], [38].…”

Section: Related Workmentioning

confidence: 99%

Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators

Stutz¹,

Chandramoorthy²,

Hein³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep neural network (DNN) accelerators received considerable attention in recent years due to the potential to save energy compared to mainstream hardware. Low-voltage operation of DNN accelerators allows to further reduce energy consumption significantly, however, causes bit-level failures in the memory storing the quantized DNN weights. Furthermore, DNN accelerators have been shown to be vulnerable to adversarial attacks on voltage controllers or individual bits. In this paper, we show that a combination of robust fixed-point quantization, weight clipping, as well as random bit error training (RANDBET) or adversarial bit error training (ADVBET) improves robustness against random or adversarial bit errors in quantized DNN weights significantly. This leads not only to high energy savings for low-voltage operation as well as low-precision quantization, but also improves security of DNN accelerators. Our approach generalizes across operating voltages and accelerators, as demonstrated on bit errors from profiled SRAM arrays, and achieves robustness against both targeted and untargeted bit-level attacks. Without losing more than 0.8%/2% in test accuracy, we can reduce energy consumption on CIFAR10 by 20%/30% for 8/4-bit quantization using RANDBET. Allowing up to 320 adversarial bit errors, ADVBET reduces test error from above 90% (chance level) to 26.22% on CIFAR10.

show abstract

Fast Adjustable Threshold for Uniform Neural Network Quantization

Cited by 9 publications

References 5 publications

Automated Backend-Aware Post-Training Quantization

Automated Backend-Aware Post-Training Quantization

Integer-Only CNNs with 4 Bit Weights and Bit-Shift Quantization Scales at Full-Precision Accuracy

Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators

Contact Info

Product

Resources

About