Proceedings of the 54th Annual Design Automation Conference 2017 2017
DOI: 10.1145/3061639.3062207
|View full text |Cite
|
Sign up to set email alerts
|

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
174
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 345 publications
(175 citation statements)
references
References 21 publications
1
174
0
Order By: Relevance
“…Our work uses instead a higher precision data type (27 bits data, 18 bits weights), considering the specific hardware multiplier implementation using 27 by 18 multipliers, which cannot (easily) be tiled with lower size operators. Considering these limitations, our work positions between [2] and [3], and is one of the fastest implementations compared to state of the art, while providing an especially good accuracy. In order to significantly improve the performance further we will employ the Winograd transformation in future research.…”
Section: B Results and Comparisonmentioning
confidence: 99%
See 1 more Smart Citation
“…Our work uses instead a higher precision data type (27 bits data, 18 bits weights), considering the specific hardware multiplier implementation using 27 by 18 multipliers, which cannot (easily) be tiled with lower size operators. Considering these limitations, our work positions between [2] and [3], and is one of the fastest implementations compared to state of the art, while providing an especially good accuracy. In order to significantly improve the performance further we will employ the Winograd transformation in future research.…”
Section: B Results and Comparisonmentioning
confidence: 99%
“…In [2], the authors propose an end-to-end automation flow for systolic array design synthesis. A 2D systolic array structure improves the timing and the data reuse of the design, and is obtained from the analysis of the nested loops implementing the considered algorithm.…”
Section: Maxj and Max5 Dfementioning
confidence: 99%
“…For convolution layers, in which the processing is described in listing 6a, nding the optimal PE con guration can be seen as a loop optimization problem [39,9,28] [77,65,40,78,36,79,80,43]. This problem is addressed by applying loop optimization techniques such loop unrolling, loop tiling or loop interchange to the 7 nested loops of listing 6a.…”
Section: Simd Accelerators and Loop Optimizationmentioning
confidence: 99%
“…This method is demonstrated to achieve higher hardware performance than iterative pruning due to the regularity in weight storage and computation. The second one is efficient hardware implementations, including FPGAs and ASICs [1,7,31,32,36,49,50,53,54]. FPGAs are gaining more popularity for striking a balance between high hardware performance and fast development round.…”
Section: Introductionmentioning
confidence: 99%