2021
DOI: 10.1109/access.2021.3138756
|View full text |Cite
|
Sign up to set email alerts
|

DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization

Abstract: To fulfil the tight area and memory constraints in IoT applications, the design of efficient Convolutional Neural Network (CNN) hardware becomes crucial. Quantization of CNN is one of the promising approach that allows the compression of large CNN into a much smaller one, which is very suitable for IoT applications. Among various proposed quantization schemes, Power-of-two (PoT) quantization enables efficient hardware implementation and small memory consumption for CNN accelerators, but requires retraining of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…DoubleQExt [88] quantizes weights and activations to 8-bit integers using layer-wise FP32 scalar and offset parameters. Thereafter, it quantizes the integer weights again to represent them in power-of-2 form using 5 bits, thus, reducing computational and memory cost.…”
Section: Mixed-precision Quantizationmentioning
confidence: 99%
See 2 more Smart Citations
“…DoubleQExt [88] quantizes weights and activations to 8-bit integers using layer-wise FP32 scalar and offset parameters. Thereafter, it quantizes the integer weights again to represent them in power-of-2 form using 5 bits, thus, reducing computational and memory cost.…”
Section: Mixed-precision Quantizationmentioning
confidence: 99%
“…As a summary of the literature, improving the accuracy of quantized DNNs comes at the expense of floating-point computational cost in [30], [32], [34], [35], [38], [42], [45], [56]- [58], [61], [63]- [67], [69], [74], [76], [78]- [80], [82]- [84], [86]- [88]. Specifically, these approaches scale output activations of each layer with FP32 coefficient(s) to recover the dynamic range, and/or perform batch normalization as well as the operations of first and last layers with FP32 datastructures.…”
Section: Mixed-precision Quantizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, there has been a growing interest in post-training quantization [7,12,12,14,[29][30][31]. Post-training quantization quantizes the pre-trained model parameters without any further retraining epochs after quantization.…”
Section: Quantizationmentioning
confidence: 99%
“…Recently, there has been a growing interest in post-training quantization [7,12,12,14,[29][30][31]. Post-training quantization quantizes the pre-trained model parameters without any further retraining epochs after quantization.…”
Section: Quantizationmentioning
confidence: 99%