2021
DOI: 10.3390/electronics10091025
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Neural Network Inference on FPGA-Based Platforms—A Survey

Abstract: The breakthrough of deep learning has started a technological revolution in various areas such as object identification, image/video recognition and semantic segmentation. Neural network, which is one of representative applications of deep learning, has been widely used and developed many efficient models. However, the edge implementation of neural network inference is restricted because of conflicts between the high computation and storage complexity and resource-limited hardware platforms in applications sce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0
3

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 58 publications
(25 citation statements)
references
References 107 publications
0
22
0
3
Order By: Relevance
“…By storing and efficiently reusing data within the memory hierarchy, the number of times accesses are made to costlier memories can be greatly reduced. Additionally, authors in [1] and [105] also provide different data-flow schemes that exploit the aforementioned data reuse opportunities,…”
Section: B Digital Acceleratorsmentioning
confidence: 99%
“…By storing and efficiently reusing data within the memory hierarchy, the number of times accesses are made to costlier memories can be greatly reduced. Additionally, authors in [1] and [105] also provide different data-flow schemes that exploit the aforementioned data reuse opportunities,…”
Section: B Digital Acceleratorsmentioning
confidence: 99%
“…Hardware resources required for each can be estimated from the generated generic PEs and layer parameters (Eqs. [11][12][13][14], the estimations allow the tool to generate multiple designs based on resource limitations and user input. Block Ram (BRAM) tiles are currently assumed to be RAMB36E1 which can be used as RAMB18E1 and FIFO18E1 [34] when needed.…”
Section: Proposed Convolutional Layer Designmentioning
confidence: 99%
“…This is crucial given the rapid changes in CNN architectures. Until recently, most works only investigated the hardware implementation of forward pass CNNs as inference engines and accelerators, there is plenty of research done to map the CNN forward pass unto FPGAs for embedded inference [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18], in contrast, there is a clear lack of work in the areas of online deployment and training on FPGAs. But with the recent breakthroughs in the new field of Continuous Learning [19][20][21][22][23], online training on embedded platforms has attracted more research.…”
Section: Introductionmentioning
confidence: 99%
“…As an example, in the case of image classification, moving from the eight-layered AlexNet [ 4 ] to the 152-layered ResNet [ 5 ] the error rates have been reduced by more than 10%, but the amount of performed multiply-and-accumulate (MAC) operations has increased by more than 80%. Such a trend makes evident that ad-hoc designed hardware accelerators are essential for deploying CNN algorithms in real-time and power-constrained systems [ 6 ].…”
Section: Introductionmentioning
confidence: 99%