2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC) 2019
DOI: 10.1109/hipc.2019.00033
|View full text |Cite
|
Sign up to set email alerts
|

SPEC2: SPECtral SParsE CNN Accelerator on FPGAs

Abstract: To accelerate inference of Convolutional Neural Networks (CNNs), various techniques have been proposed to reduce computation redundancy. Converting convolutional layers into frequency domain significantly reduces the computation complexity of the sliding window operations in space domain. On the other hand, weight pruning techniques address the redundancy in model parameters by converting dense convolutional kernels into sparse ones. To obtain high-throughput FPGA implementation, we propose SPEC 2 -the first w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 22 publications
(42 reference statements)
0
11
0
Order By: Relevance
“…They keep the input activations in SRAM and stream the sparse kernels. A similar design [Niu et al 2019] streams activations with stationary weights. Both have limited reuse due to the limited BRAM (on-chip SRAM) on FPGAs.…”
Section: Inference Acceleratorsmentioning
confidence: 99%
“…They keep the input activations in SRAM and stream the sparse kernels. A similar design [Niu et al 2019] streams activations with stationary weights. Both have limited reuse due to the limited BRAM (on-chip SRAM) on FPGAs.…”
Section: Inference Acceleratorsmentioning
confidence: 99%
“…To make a fair comparison across different platforms, we also present the DSP-efficiency and logic-efficiency on each platform. On average, our design exhibits 0.24 GOP/s/DSP DSP-efficiency, which shows 2.5X-5.7X improvement compared with prior works [16,21,48]. On the other hand, our design shows lower logic-efficiency.…”
Section: B Performance Analysismentioning
confidence: 68%
“…The performance on VGG network is 309.0 GOP/s which is 3.6X-4.8X higher than [16,21]. [48] shows higher performance because they pruned the network in the frequency domain which results in elementwise multiplication pattern. This computation pattern shows less complexity compared with the convolution operator.…”
Section: B Performance Analysismentioning
confidence: 98%
See 1 more Smart Citation
“…1a). Besides, additional control logic is required to compute operations (e.g matrix multiplication) with such formats, increasing the complexity and power consumption for embedded applications [14], [15], [16]. Therefore, we propose a framework that naturally generates structured sparsity for several levels of granularity, by fixing the number of active elements within a candidate set (comprising e.g.…”
Section: Introductionmentioning
confidence: 99%