Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2015
DOI: 10.1145/2684746.2689060
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Abstract: Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, and fast development round, etc. Although current FP… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

5
1,067
0
9

Year Published

2016
2016
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,702 publications
(1,081 citation statements)
references
References 16 publications
5
1,067
0
9
Order By: Relevance
“…There is much work related to CNN accelerator design on FPGA. Zhang et al [16] use the roofline model and data dependencies analysis to optimise a convolution-only CNN architecture. Qiu et al [7] successfully deploy VGGNet on an embedded FPGA platform, with several optimisation techniques like data quantisation and coefficient matrix decomposition.…”
Section: Background and Related Workmentioning
confidence: 99%
“…There is much work related to CNN accelerator design on FPGA. Zhang et al [16] use the roofline model and data dependencies analysis to optimise a convolution-only CNN architecture. Qiu et al [7] successfully deploy VGGNet on an embedded FPGA platform, with several optimisation techniques like data quantisation and coefficient matrix decomposition.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Many ANN applications that interact with physical systems require the accuracy and dynamic range offered by floating point representations, resulting in increased complexity at each neuron. FPGAs represent an ideal platform for accelerating ANN-based systems because they enable large scale parallelism while also supporting high throughput floating point computations [3], [4], [9].…”
Section: Related Workmentioning
confidence: 99%
“…On the contrary, the CNN has its unique feature that the filters' weights will be largely reused throughout each image during scanning. Benefiting from this feature, many dedicated CNN hardware accelerators are reported [10][11][12]. Most of reported CNN accelerators only focus on accelerating the convolution part while ignoring the implementation of the pooling function, which is a common layer in the CNN network.…”
Section: Introductionmentioning
confidence: 99%
“…In [10], a CNN hardware accelerator using a spatial architecture with 168 processing elements is demonstrated. In [11], another dedicated convolution accelerator with loop-unfolding optimization is reported. Since pooling function is not implemented in those accelerators, the convolution results must be transferred to CPU/GPU to run pooling function and then fed back to the accelerator to compute the next layer.…”
Section: Introductionmentioning
confidence: 99%