2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC) 2016
DOI: 10.1109/aspdac.2016.7428073
|View full text |Cite
|
Sign up to set email alerts
|

Design space exploration of FPGA-based Deep Convolutional Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
72
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 173 publications
(72 citation statements)
references
References 9 publications
0
72
0
Order By: Relevance
“…The result is then added to the previously obtained partial sum. However, as the kernel sizes (Nkx and Nky) are usually relatively small, stand-alone loop unrolling within one kernel window cannot provide enough parallelism to fully utilize the accelerator compute resources [34]. Fig.…”
Section: Convolutional Layer Of a Dnnmentioning
confidence: 99%
See 1 more Smart Citation
“…The result is then added to the previously obtained partial sum. However, as the kernel sizes (Nkx and Nky) are usually relatively small, stand-alone loop unrolling within one kernel window cannot provide enough parallelism to fully utilize the accelerator compute resources [34]. Fig.…”
Section: Convolutional Layer Of a Dnnmentioning
confidence: 99%
“…These loop unrolling types can be combined to further increase the parallelism in convolutional layer processing. For example, loop unrolling within the kernel window, across multiple input feature map channels, and across different kernels are employed together in [6], [34], [35] while loop unrolling within one kernel window and within one input feature map channel are utilized in [27].…”
Section: Convolutional Layer Of a Dnnmentioning
confidence: 99%
“…For convolution layers, in which the processing is described in listing 6a, nding the optimal PE con guration can be seen as a loop optimization problem [39,9,28] [77,65,40,78,36,79,80,43]. This problem is addressed by applying loop optimization techniques such loop unrolling, loop tiling or loop interchange to the 7 nested loops of listing 6a.…”
Section: Simd Accelerators and Loop Optimizationmentioning
confidence: 99%
“…To address this optimization problem, a brute force exploration is performed, such in [39,28,77,65,40,78]. This exploration is usually driven by the Roo ine method [82] in order to select the feasible design solutions that matches with the maximum computational throughput and the maximum memory bandwidth a given platform can deliver [39,40,41]. The design space can also be explored by means of heuristic search algorithms, as proposed for instance in [35].…”
Section: Design Space Explorationmentioning
confidence: 99%
“…The FPGA more advantages than other digital controllers in high speed operation, low power consumption, parallel processing and reconfigurable design.The convolution neural network based on FPGA is effectively used in image identification [6].The FPGA accelerator for the 3D convolution design aides to avoid the loading repetition of the processing feature maps [7]. The performance of deep convolution neural network is 1.9 to 250 times faster by utilizing FPGA device [2] [8]. Controllers analyses for nonlinear systems has been reported [9][10][11][12][13][14][15][16][17][18] The real time implementation of the convolution based on NTT algorithmis evaluated by using the FPGA devices namely Xilinx Spartan 3A DSP FPGA and Xilinx Virtex 6 FPGA.…”
Section: Introductionmentioning
confidence: 99%