Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Zhang, Chen; Li, Peng; Sun, Guangyu; Guan, Yijin; Xiao, Bingjun; Cong, Jason

doi:10.1145/2684746.2689060

Cited by 1,702 publications

(1,081 citation statements)

References 16 publications

Supporting

Mentioning

1,067

Contrasting

Unclassified

Order By: Relevance

“…There is much work related to CNN accelerator design on FPGA. Zhang et al [16] use the roofline model and data dependencies analysis to optimise a convolution-only CNN architecture. Qiu et al [7] successfully deploy VGGNet on an embedded FPGA platform, with several optimisation techniques like data quantisation and coefficient matrix decomposition.…”

Section: Background and Related Workmentioning

confidence: 99%

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

Zhao

Niu

Wu³

et al. 2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Algorithms based on Convolutional Neural Network (CNN) have recently been applied to object detection applications, greatly improving their performance. However, many devices intended for these algorithms have limited computation resources and strict power consumption constraints, and are not suitable for algorithms designed for GPU workstations. This paper presents a novel method to optimise CNN-based object detection algorithms targeting embedded FPGA platforms. Given parameterised CNN hardware modules, an optimisation flow takes network architectures and resource constraints as input, and tunes hardware parameters with algorithm-specific information to explore the design space and achieve high performance. The evaluation shows that our design model accuracy is above 85% and, with optimised configuration, our design can achieve 49.6 times speed-up compared with software implementation.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

Zhao

Niu

Wu³

et al. 2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Many ANN applications that interact with physical systems require the accuracy and dynamic range offered by floating point representations, resulting in increased complexity at each neuron. FPGAs represent an ideal platform for accelerating ANN-based systems because they enable large scale parallelism while also supporting high throughput floating point computations [3], [4], [9].…”

Section: Related Workmentioning

confidence: 99%

Accelerated Artificial Neural Networks on FPGA for Fault Detection in Automotive Systems

Shreejith

Anshuman

Fahmy

2016

Proceedings of the 2016 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE)

View full text Add to dashboard Cite

“…On the contrary, the CNN has its unique feature that the filters' weights will be largely reused throughout each image during scanning. Benefiting from this feature, many dedicated CNN hardware accelerators are reported [10][11][12]. Most of reported CNN accelerators only focus on accelerating the convolution part while ignoring the implementation of the pooling function, which is a common layer in the CNN network.…”

Section: Introductionmentioning

confidence: 99%

“…In [10], a CNN hardware accelerator using a spatial architecture with 168 processing elements is demonstrated. In [11], another dedicated convolution accelerator with loop-unfolding optimization is reported. Since pooling function is not implemented in those accelerators, the convolution results must be transferred to CPU/GPU to run pooling function and then fed back to the accelerator to compute the next layer.…”

Section: Introductionmentioning

confidence: 99%

A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things

Li³

et al. 2018

IEEE Trans. Circuits Syst. I

185

137

View full text Add to dashboard Cite

Abstract-Convolutional neural network (CNN) offers significant accuracy in image detection. To implement imagedetection using CNN in the internet of things (IoT) devices, a streaming hardware accelerator is proposed. The proposed accelerator optimizes the energy efficiency by avoiding unnecessary data movement. With unique filter decomposition technique, the accelerator can support arbitrary convolution window size. In addition, max pooling function can be computed in parallel with convolution by using separate pooling unit, thus achieving throughput improvement. A prototype accelerator was implemented in TSMC 65nm technology with a core size of 5mm 2 . The accelerator can support major CNNs and achieve 152GOPS peak throughput and 434GOPS/W energy efficiency at 350mW, making it a promising hardware accelerator for intelligent IoT devices.

show abstract

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Cited by 1,702 publications

References 16 publications

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

Accelerated Artificial Neural Networks on FPGA for Fault Detection in Automotive Systems

A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things

Contact Info

Product

Resources

About