2019 29th International Conference on Field Programmable Logic and Applications (FPL) 2019
DOI: 10.1109/fpl.2019.00032
|View full text |Cite
|
Sign up to set email alerts
|

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

Abstract: Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with reconfigurable logic provides an acceptable acceleration capacity and is compatible with diverse computation-sensitive tasks in the cloud. In this paper, we develop an FPGA acceleration platform that leverages a unified framework architecture for generalpurpose convolutional neural netwo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(9 citation statements)
references
References 29 publications
0
9
0
Order By: Relevance
“…This is partly due to the advantage of DYNAMAP's optimizations on dataflow and algorithm switching, partly due to the lower-precision we adopted enabling more PEs. Even if we scale down the systolic array size (2 DSP consumption per PE), in the worst case the performance will be halved and we still achieve 2× and 1.4× lower latency compared with [12] and [27] respectively. For Inception-v4, we compare with [31] which applies dynamic memory management to overcome data transfer bottlenecks and [25] that uses kn2row method for all layers in GoogleNet.…”
Section: Evaluation Of Optimizationsmentioning
confidence: 99%
See 1 more Smart Citation
“…This is partly due to the advantage of DYNAMAP's optimizations on dataflow and algorithm switching, partly due to the lower-precision we adopted enabling more PEs. Even if we scale down the systolic array size (2 DSP consumption per PE), in the worst case the performance will be halved and we still achieve 2× and 1.4× lower latency compared with [12] and [27] respectively. For Inception-v4, we compare with [31] which applies dynamic memory management to overcome data transfer bottlenecks and [25] that uses kn2row method for all layers in GoogleNet.…”
Section: Evaluation Of Optimizationsmentioning
confidence: 99%
“…We achieve 286MHz frequency for both GoogleNet and Inception-v4 accelerator designs. GoogleNet acceleration using DYNAMAP significantly outperforms [12] and [27] in terms of both latency and throughput. This is partly due to the advantage of DYNAMAP's optimizations on dataflow and algorithm switching, partly due to the lower-precision we adopted enabling more PEs.…”
Section: Evaluation Of Optimizationsmentioning
confidence: 99%
“…Yu et al [27] developed an FPGA acceleration platform that leverages a unified framework architecture for generalpurpose CNN inference acceleration at a data center achieving a throughput comparable with the state-of-the-art GPU in this field, with less latency. This work exploits on-chip DSPs, on a Kintex KU115, arranged as supertile units (SUs), to overcome the computational bound and, together with dispachingassembling model and broadcast caches, to deal with the memory bound.…”
Section: Related Workmentioning
confidence: 99%
“…In recent years, advances in integrated circuit technology have brought significant improvements to FPGA performance. Coupled with the inherently high hardware parallelism, FPGAs are used as hardware accelerators in more and more fields, such as signal processing [1,2], scientific computing [3][4][5], machine learning [6][7][8], and data centers [9][10][11]. In these applications, some algorithms include a large number of operations of vector and matrix, which belongs to the category of reduction problem.…”
Section: Introductionmentioning
confidence: 99%