Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

Nguyen, Duy Thanh; Kim, Hyun; Lee, Hyuk-Jae

doi:10.1109/tcsvt.2020.3020569

Cited by 68 publications

(18 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because convolutional neural networks (CNNs) are used in the field of computer vision, the accuracy of object detection and classification increases dramatically [1]- [4]. However, because a DNN uses many layers, a large number of parameters are required, which significantly increases computational complexity [5]- [7]. In particular, for DNN-based object detection, classification, and localization are performed simultaneously, which requires vast computation [8]- [11].…”

Section: Introductionmentioning

confidence: 99%

Zero-Centered Fixed-Point Quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

Kim

2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

In the field of object detection, deep learning has greatly improved accuracy compared to previous algorithms and has been used widely in recent years. However, object detection using deep learning requires many hardware (HW) resources due to the huge computations for high performance, making it very difficult to run real-time on embedded platforms. Therefore, various compression methods have been studied to solve this problem. In particular, quantization methods greatly reduce the computational burden of deep learning by reducing the number of bits used for weights and activation functions in deep learning. However, most of these existing studies targeted only object classification and cannot be applied to object detection. Furthermore, most of the existing quantization studies are based on floating-point operations, which requires additional effort when implementing HW accelerators. This paper proposes an HW-friendly fixed-point-based quantization method that can also be applied to object detection. In the proposed method, the center of the weight distribution is adjusted to zero by subtracting the mean of weight parameters before quantization, and the retraining process is iteratively applied to minimize the accuracy drop caused by quantization. Furthermore, while applying the proposed method to object detection, performance degradation is minimized by considering the minimum and maximum values of weight parameters of deep learning networks. When applying the proposed quantization method to representative one-stage object detectors, You Only Look Once v3 and v4 (YOLOv3 and YOLOv4), detection accuracy similar to the original networks (i.e., YOLOv3 and YOLOv4) with a single-precision floating-point format (32-bit) is maintained despite expressing weights with only about 20% of the bits compared to a singleprecision floating-point format in COCO dataset. INDEX TERMS Convolutional neural network, deep neural network, fixed-point quantization, network compression, object detector, YOLOv3, YOLOv4.

show abstract

Section: Introductionmentioning

confidence: 99%

Zero-Centered Fixed-Point Quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

Kim

2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…The quantisation was homogeneous across the entire network each time, i.e., each quantisation configuration applied to all parameters. Combining layer-specific dataflow optimisation and layer-specific quantisation allows models to fit entirely in on-chip BRAM, thereby removing off-chip memory accesses which improves throughput performance [44]. In [45], mixed precision quantisation scheme applies layer-wise priority in inverse order of their layer depth, based on findings that binarising different layers has a widely-varied effect on accuracy loss.…”

Section: Profile Guided Automating Compressionmentioning

confidence: 99%

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

et al. 2021

View full text Add to dashboard Cite

This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62× speedup.

show abstract

“…However, the lightweight CNN model contains a variety of kernel sizes, which challenges the design of FPGA-based CNN accelerators. Most existing designs [12][13][14][15][16][17][18][19][20][21] can effectively handle the convolution with some specified kernel sizes. However, when the kernel size changes, the utilization of PE units in the computation array is significantly reduced.…”

Section: Introductionmentioning

confidence: 99%

“…However, when the kernel size changes, the utilization of PE units in the computation array is significantly reduced. The designs proposed in [16,17,21] can deal with convolutions of several common kernel sizes, but it is still not applicable to convolutions of any kernel sizes. The authors in [22][23][24][25][26] adopt multiple computing engines to deal with the convolution with different kernel sizes for improving performance.…”

Section: Introductionmentioning

confidence: 99%

“…Thus, without elaborately design, this mismatch between data and computation elements may lead to ultra-low utilization of FPGA resources, which is undesirable for low-cost FPGAs. For example, in [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31], although the size of the CNN model is significanly reduced, the bit utilization of FPGA resources is very low. To solve the above problems, this brief proposes an efficient hardware accelerator for low-bit quantized lightweight CNN models and the contribution can be summarized as follows:…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Design and implementation of an efficient CNN accelerator for low-cost FPGAs

Wang

et al. 2022

IEICE Electron. Express

View full text Add to dashboard Cite

This paper proposes a computation-array-centered dataflow, which adjusts the convolution with different kernel sizes to a unified computing manner and reduces the dimension of computation array from 2D to 1D, so as to maximize the utilization of the computation elements offered by the accelerator. Furthermore, a single unit multiple data (SUMD) strategy is proposed to effectively alleviate the mismatch between the quantized data and the hardware resources with fixed bit width on FPGA. As a case study, an 8-bit MobileNetV2 model has been implemented on the low-cost ZYNQ XC7Z020 FPGA, whose FPS/DSP and GOPS/DSP achieve upto 0.55 and 0.35 respectively.

show abstract

Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

Cited by 68 publications

References 53 publications

Zero-Centered Fixed-Point Quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

Zero-Centered Fixed-Point Quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Design and implementation of an efficient CNN accelerator for low-cost FPGAs

Contact Info

Product

Resources

About