2019
DOI: 10.1007/978-3-030-17227-5_28
|View full text |Cite
|
Sign up to set email alerts
|

Faster Convolutional Neural Networks in Low Density FPGAs Using Block Pruning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…In FPGA technology, compression techniques are suitable to reduce redundant parameters and memory footprint, which has direct impact in the power consumption, speed and resource use [32][33][34][35]. Cheng et al [36] presented a review of the state of the art in compression techniques, summarizing the different approaches in: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters and knowledge distillation.…”
Section: Perspectives For An Fpga Realization Of the Nlcnmentioning
confidence: 99%
“…In FPGA technology, compression techniques are suitable to reduce redundant parameters and memory footprint, which has direct impact in the power consumption, speed and resource use [32][33][34][35]. Cheng et al [36] presented a review of the state of the art in compression techniques, summarizing the different approaches in: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters and knowledge distillation.…”
Section: Perspectives For An Fpga Realization Of the Nlcnmentioning
confidence: 99%
“…This generates unbalanced parallelism in the computation of output maps and irregular accesses to on-chip memory. A few approaches have followed to reduce the effects of sparsity [135,136]. In these works, pruning was guided by the datapath of the target computing processor so that it could take advantage of the available computing parallelism.…”
Section: Hardware-oriented Deep Neural Network Optimizationsmentioning
confidence: 99%
“…The first generation of CNN implementations take performance as the main optimization metric. Recently, a few works based on the single module approach [22]- [24] have started to consider other metrics such as area and power, to enable design trade-offs.…”
Section: Related Workmentioning
confidence: 99%
“…In [22], only convolutional layers are considered, while in [23], [24], convolutional and fully-connected layers are considered but the same module is used in both. Instead of using the same hardware module in all layers, pipelines of layer-specific modules have been proposed [25], [26].…”
Section: Related Workmentioning
confidence: 99%