2020
DOI: 10.1109/tvlsi.2020.2995741
|View full text |Cite
|
Sign up to set email alerts
|

Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 43 publications
(17 citation statements)
references
References 21 publications
0
17
0
Order By: Relevance
“…The obtained results are summarized in Table 1 in terms of: supported parallelism (T M , T N , P M , and P N ), kernel size (K) and stride (S); resources requirements; running frequency; number of operations performed per second (GOPs); and, finally, dynamic power consumption. It is worth highlighting that while the designs presented in [15][16][17] are SUs, those demonstrated in [9,11,18] are embedded heterogeneous systems (ESs). For this reason, several SU and ES versions of the design here presented have been characterized and they are referenced in Table 1.…”
Section: Implementation and Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…The obtained results are summarized in Table 1 in terms of: supported parallelism (T M , T N , P M , and P N ), kernel size (K) and stride (S); resources requirements; running frequency; number of operations performed per second (GOPs); and, finally, dynamic power consumption. It is worth highlighting that while the designs presented in [15][16][17] are SUs, those demonstrated in [9,11,18] are embedded heterogeneous systems (ESs). For this reason, several SU and ES versions of the design here presented have been characterized and they are referenced in Table 1.…”
Section: Implementation and Resultsmentioning
confidence: 99%
“…The efficient design strategy recently presented in [ 17 ] overcomes the above issues by performing a kernel conversion to calculate all the pre-addable weight combinations. The output of this process is a new set of filters that can be directly applied to the ifmaps to perform a traditional 3D convolution.…”
Section: Background Related Work and Motivationsmentioning
confidence: 99%
See 2 more Smart Citations
“…DNN acceleration solutions mainly faces two bottlenecks: enormous multiply and accumulate (MAC) operations and great number of parameters. To deal with these problems, researchers have been focused on application and specific integrated circuits (ASIC) [7,8,9,10,11,12,13,14,15] and field-programmable gate array (FPGA) [16,17,18,19,20,21,22,23,24]. Due to its high parallelism property, data flow architectures has become a key research area [8,9,10,11,12,13,18,19,20].…”
Section: Introductionmentioning
confidence: 99%