2021
DOI: 10.1109/tcsvt.2020.3020569
|View full text |Cite
|
Sign up to set email alerts
|

Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

Abstract: Convolutional neural networks (CNNs) require both intensive computation and frequent memory access, which lead to a low processing speed and large power dissipation. Although the characteristics of the different layers in a CNN are frequently quite different, previous hardware designs have employed common optimization schemes for them. This paper proposes a layer-specific design that employs different organizations that are optimized for the different layers. The proposed design employs two layer-specific opti… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 68 publications
(18 citation statements)
references
References 53 publications
0
18
0
Order By: Relevance
“…Because convolutional neural networks (CNNs) are used in the field of computer vision, the accuracy of object detection and classification increases dramatically [1]- [4]. However, because a DNN uses many layers, a large number of parameters are required, which significantly increases computational complexity [5]- [7]. In particular, for DNN-based object detection, classification, and localization are performed simultaneously, which requires vast computation [8]- [11].…”
Section: Introductionmentioning
confidence: 99%
“…Because convolutional neural networks (CNNs) are used in the field of computer vision, the accuracy of object detection and classification increases dramatically [1]- [4]. However, because a DNN uses many layers, a large number of parameters are required, which significantly increases computational complexity [5]- [7]. In particular, for DNN-based object detection, classification, and localization are performed simultaneously, which requires vast computation [8]- [11].…”
Section: Introductionmentioning
confidence: 99%
“…The quantisation was homogeneous across the entire network each time, i.e., each quantisation configuration applied to all parameters. Combining layer-specific dataflow optimisation and layer-specific quantisation allows models to fit entirely in on-chip BRAM, thereby removing off-chip memory accesses which improves throughput performance [44]. In [45], mixed precision quantisation scheme applies layer-wise priority in inverse order of their layer depth, based on findings that binarising different layers has a widely-varied effect on accuracy loss.…”
Section: Profile Guided Automating Compressionmentioning
confidence: 99%
“…However, the lightweight CNN model contains a variety of kernel sizes, which challenges the design of FPGA-based CNN accelerators. Most existing designs [12][13][14][15][16][17][18][19][20][21] can effectively handle the convolution with some specified kernel sizes. However, when the kernel size changes, the utilization of PE units in the computation array is significantly reduced.…”
Section: Introductionmentioning
confidence: 99%
“…However, when the kernel size changes, the utilization of PE units in the computation array is significantly reduced. The designs proposed in [16,17,21] can deal with convolutions of several common kernel sizes, but it is still not applicable to convolutions of any kernel sizes. The authors in [22][23][24][25][26] adopt multiple computing engines to deal with the convolution with different kernel sizes for improving performance.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation