2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) 2018
DOI: 10.1109/fccm.2018.00021
|View full text |Cite
|
Sign up to set email alerts
|

FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
30
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 72 publications
(30 citation statements)
references
References 2 publications
0
30
0
Order By: Relevance
“…Although the variety of works of CNNs on FPGA is very higher, only a few papers exploit a system with multiple FPGAs. This is the case of [8], [9] where a deeply pipelined multi-FPGA architecture is used both for training and inference of CNNs. However, deeply pipelined multi-FPGA architecture fits only a specific class of algorithms within the distributed scenarios and authors described a custom communication infrastructure to deal with distributed nodes communication, instead of trying to generalize the technique.…”
Section: Related Workmentioning
confidence: 99%
“…Although the variety of works of CNNs on FPGA is very higher, only a few papers exploit a system with multiple FPGAs. This is the case of [8], [9] where a deeply pipelined multi-FPGA architecture is used both for training and inference of CNNs. However, deeply pipelined multi-FPGA architecture fits only a specific class of algorithms within the distributed scenarios and authors described a custom communication infrastructure to deal with distributed nodes communication, instead of trying to generalize the technique.…”
Section: Related Workmentioning
confidence: 99%
“…Most recently, with the growing demand in time performance, it is a trend to employ a cluster of FPGAs to execute DNNs [15,[26][27][28][29][30][31][32]. In [15,28], authors construct multiple FPGAs as a pipeline to execute a set of input images in a pipeline fashion.…”
Section: Related Workmentioning
confidence: 99%
“…In [26], authors split the CNN layers to balance pipeline stages for higher throughput and lower cost. Authors in [27] employ multiple FPGAs for the training phase. In [29,30], multi-FPGA platforms are utilized to accelerate the lung nodule segmentation.…”
Section: Related Workmentioning
confidence: 99%
“…Our co-exploration concept and the general framework, however, can also be easily extended to other hardware platforms such as ASICs. Since timing performance on a single FPGA is limited by its restricted resource, it is prevalent to organize multiple FPGAs in a pipelined fashion [20]- [23] to provide high throughput (frame per second, FPS). In such a system, the pipeline efficiency is one of the most important metrics needing to be maximized, since it determines the hardware utilization as well as energy efficiency.…”
Section: Introductionmentioning
confidence: 99%