2021
DOI: 10.1587/elex.18.20210012
|View full text |Cite
|
Sign up to set email alerts
|

An FPGA-based accelerator for deep neural network with novel reconfigurable architecture

Abstract: Due to the high parallelism, Data flow architecture is a common solution for deep neural network (DNN) acceleration, however, existing DNN accelerate solutions exhibit limited flexibility to diverse network models. This paper presents a novel reconfigurable architecture as DNN accelerate solution, which consists of circuit blocks all can be reconfigured to adapt to different networks, and maintain high throughput. The proposed architecture shows good transferability to diverse DNN models due to its reconfigura… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 33 publications
(48 reference statements)
0
5
0
Order By: Relevance
“…Identifying an efficient dataflow is crucial for defining the spatial structure of the PEs and the overall PEA. There are four common dataflows [8]: no local reuse (NLR) [4,9], input stationary (IS) [10], output stationary (OS) [11,12], and weight stationary (WS) [6,13]. The NLR dataflow, typically implemented in a tree structure, does not reuse data, leading to higher hardware costs.…”
Section: Related Workmentioning
confidence: 99%
“…Identifying an efficient dataflow is crucial for defining the spatial structure of the PEs and the overall PEA. There are four common dataflows [8]: no local reuse (NLR) [4,9], input stationary (IS) [10], output stationary (OS) [11,12], and weight stationary (WS) [6,13]. The NLR dataflow, typically implemented in a tree structure, does not reuse data, leading to higher hardware costs.…”
Section: Related Workmentioning
confidence: 99%
“…Notably, the ResNet network introduced the concept of residual blocks, enabling the training of deeper neural networks, which helps mitigate the vanishing gradient problem [14]. Work on FPGAs, which offer the advantages of low latency, low power consumption, and high flexibility over traditional hardware acceleration solutions, has been widely carried out [15][16][17][18][19][20][21][22][23][24][25][26][27]. However, they face limitations in on-chip resources, and modifications in network architecture necessitate hardware circuit redesign.…”
Section: Introductionmentioning
confidence: 99%
“…However, NMS is a greedy algorithm that is computationally intensive and has a complexity of O(N 2 ), leading to increased processing time for a large number of detected targets. Recent many FPGA-based and ASIC edge neural network acceleration chips [7,8,9,10,11,12,13,14] such as UNPU [11], Eyeriss [12], and CASSANN-v2 [13], have been proposed to target general neural network operations (i.e., convolution). However, when deploying object detection neural networks, these chips often offload the NMS algorithm to the on-chip embedded CPU, significantly increasing the end-to-end inference time of object detection neural networks at the edge.Therefore, it is vital to develop a customized circuit to reduce the computation time of the NMS algorithm at the edge.…”
Section: Introductionmentioning
confidence: 99%