DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization

See, Jin-Chuan; Ng, Hui-Fuang; Tan, Hung-Khoon; Chang, Jing-Jing; Lee, Wai-Kong; Hwang, Seong Oun

doi:10.1109/access.2021.3138756

Cited by 9 publications

(8 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DoubleQExt [88] quantizes weights and activations to 8-bit integers using layer-wise FP32 scalar and offset parameters. Thereafter, it quantizes the integer weights again to represent them in power-of-2 form using 5 bits, thus, reducing computational and memory cost.…”

Section: Mixed-precision Quantizationmentioning

confidence: 99%

“…As a summary of the literature, improving the accuracy of quantized DNNs comes at the expense of floating-point computational cost in [30], [32], [34], [35], [38], [42], [45], [56]- [58], [61], [63]- [67], [69], [74], [76], [78]- [80], [82]- [84], [86]- [88]. Specifically, these approaches scale output activations of each layer with FP32 coefficient(s) to recover the dynamic range, and/or perform batch normalization as well as the operations of first and last layers with FP32 datastructures.…”

Section: Mixed-precision Quantizationmentioning

confidence: 99%

“…Further, while the latter framework performs the MAC operation using 9.65-bit activations and 8-bit weights, on average, the former is more computationally efficient because it uses 6.29-bit activations and 4.81-bit weights. Compared to DoubleQExt [88], FxP-QNet reduces memory requirements for model parameters by about 1.35MB with only 0.20% and 0.29% extra drop in top-1 and top-5 accuracies, respectively. This is despite the fact that the former framework requires floating-point arithmetic to transform the output of MAC operations, whereas FxP-QNet-quantized ResNet-18 is designed for integer-only deployment.…”

Section: ) Comparison With Prior Workmentioning

confidence: 99%

See 2 more Smart Citations

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

2022

View full text Add to dashboard Cite

Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. In the past, graphic processing units enabled these breakthroughs because of their greater computational speed. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and development of dedicated deep learning (DL) hardware to improve the throughput and energy efficiency of DNNs. Low-precision representation of DNN data-structures through quantization would bring great benefits to specialized DL hardware especially when expensive floating-point operations can be avoided and replaced by more efficient fixed-point operations. However, the rigorous quantization leads to a severe accuracy drop. As such, quantization opens a large hyper-parameter space at bit-precision levels, the exploration of which is a major challenge. In this paper, we propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) that flexibly designs a mixed low-precision DNN for integer-arithmetic-only deployment. Specifically, the FxP-QNet gradually adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements. Additionally, it employs post-training self-distillation and network prediction error statistics to optimize the quantization of floating-point values into fixed-point numbers. Examining FxP-QNet 1 on state-of-the-art architectures and the benchmark ImageNet dataset, we empirically demonstrate the effectiveness of FxP-QNet in achieving the accuracy-compression trade-off without the need for training. The results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16ˆ, 10.36ˆ, and 6.44ˆwith less than 0.95%, 0.95%, and 1.99% accuracy drop, respectively.

show abstract

Section: Mixed-precision Quantizationmentioning

confidence: 99%

Section: Mixed-precision Quantizationmentioning

confidence: 99%

Section: ) Comparison With Prior Workmentioning

confidence: 99%

See 1 more Smart Citation

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

2022

View full text Add to dashboard Cite

show abstract

“…Recently, there has been a growing interest in post-training quantization [7,12,12,14,[29][30][31]. Post-training quantization quantizes the pre-trained model parameters without any further retraining epochs after quantization.…”

Section: Quantizationmentioning

confidence: 99%