HFOD: A hardware-friendly quantization method for object detection on embedded FPGAs

Zhang, Fei; Gao, Ziyang; Huang, Jiaming; Zhen, Peining; Chen, Hai-Bao; Yan, Jie

doi:10.1587/elex.19.20220067

Cited by 4 publications

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many accelerators with execution prediction are also proposed to reduce the computation of DNNs [8,9]. Quantization is a common step in DNNs deployment [10,11,12]. However, dynamic bit-width quantization has not been well studied.…”

Section: Introductionmentioning

confidence: 99%

Sample-wise dynamic precision quantization for neural network acceleration

Xiong

Huang

et al. 2022

IEICE Electron. Express

View full text Add to dashboard Cite

Quantization is a well-known method for deep neural networks (DNNs) compression and acceleration. In this work, we propose the Sample-Wise Dynamic Precision (SWDP) quantization scheme, which can switch the bit-width of weights and activations in the model according to the task difficulty of input samples at runtime. Using low-precision networks for easy input images brings advantages in terms of computational and energy efficiency. We also propose an adaptive hardware design for the efficient implementation of our SWDP networks. The experimental results on various networks and datasets demonstrate that our SWDP achieves an average of 3.3× speedup and 3.0× energy saving over the bit-level dynamically composable architecture BitFusion.

show abstract

Section: Introductionmentioning

confidence: 99%

Sample-wise dynamic precision quantization for neural network acceleration

Xiong

Huang

et al. 2022

IEICE Electron. Express

View full text Add to dashboard Cite

show abstract

“…The designs proposed in [16,17,21] can deal with convolutions of several common kernel sizes, but it is still not applicable to convolutions of any kernel sizes. The authors in [22][23][24][25][26] adopt multiple computing engines to deal with the convolution with different kernel sizes for improving performance. However, this design costs too much hardware resources, which is not suitable for low-cost FPGAs.…”

Section: Introductionmentioning

confidence: 99%

“…Thus, without elaborately design, this mismatch between data and computation elements may lead to ultra-low utilization of FPGA resources, which is undesirable for low-cost FPGAs. For example, in [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31], although the size of the CNN model is significanly reduced, the bit utilization of FPGA resources is very low. To solve the above problems, this brief proposes an efficient hardware accelerator for low-bit quantized lightweight CNN models and the contribution can be summarized as follows:…”

Section: Introductionmentioning

confidence: 99%

Design and implementation of an efficient CNN accelerator for low-cost FPGAs

Wang

et al. 2022

IEICE Electron. Express

View full text Add to dashboard Cite

This paper proposes a computation-array-centered dataflow, which adjusts the convolution with different kernel sizes to a unified computing manner and reduces the dimension of computation array from 2D to 1D, so as to maximize the utilization of the computation elements offered by the accelerator. Furthermore, a single unit multiple data (SUMD) strategy is proposed to effectively alleviate the mismatch between the quantized data and the hardware resources with fixed bit width on FPGA. As a case study, an 8-bit MobileNetV2 model has been implemented on the low-cost ZYNQ XC7Z020 FPGA, whose FPS/DSP and GOPS/DSP achieve upto 0.55 and 0.35 respectively.

show abstract