Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2019
DOI: 10.1145/3289602.3293925
|View full text |Cite
|
Sign up to set email alerts
|

Compute-Efficient Neural-Network Acceleration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 10 publications
0
11
0
Order By: Relevance
“…The CORDIC module is used in hyperbolic rotation mode for realizing sinh and cosh functions using Eq. (10). We take an example calculation for sinh (30) and cosh (30) as shown in Table 3.…”
Section: B Activation Function Computation Using Reconmentioning
confidence: 99%
See 1 more Smart Citation
“…The CORDIC module is used in hyperbolic rotation mode for realizing sinh and cosh functions using Eq. (10). We take an example calculation for sinh (30) and cosh (30) as shown in Table 3.…”
Section: B Activation Function Computation Using Reconmentioning
confidence: 99%
“…This trade-off gets even more complicated when configurable architectures are required. The FPGAs offer configurable hardware designs but need more chip areas with higher power consumption as compared to ASICs [5], [9], [10].…”
Section: Introductionmentioning
confidence: 99%
“…This micro-architecture is optimized for both computation speed and resource utilization. In [24], a pipeline and DRAM-free architecture is proposed to simplify data movement between memory and processing elements. This design achieves very high working frequency as well as DSP utilization.…”
Section: Cnn Acceleratorsmentioning
confidence: 99%
“…Furthermore, these hard blocks provide specialized nearest-neighbor interconnect for high-bandwidth, low-latency cascade data movement. These features make it particularly attractive for building systolic neural network accelerators such as CLP [42,43], Cascades [40], and Xilinx SuperTile [49,50].…”
Section: Introductionmentioning
confidence: 99%
“…The CLP designs presented in [42,43] only operate at 100ś170 MHz on Virtex-7 FPGAs but leave DSPs unused. The Xilinx SuperTile [49,50] designs run at 720 MHz, but leave half of the DSPs unused, and also waste URAM bandwidth by limiting access. The chip-spanning 650 MHz 1920×9 systolic array design for the VU11P FPGA [40] requires 95% or more of the hard block resources but fails to route in commercial-grade Xilinx Vivado run with high efort due to congestion.…”
Section: Introductionmentioning
confidence: 99%