2021
DOI: 10.48550/arxiv.2101.07948
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SparseDNN: Fast Sparse Deep Learning Inference on CPUs

Abstract: The last few years have seen gigantic leaps in algorithms and systems to support efficient deep learning inference. Pruning and quantization algorithms can now consistently compress neural networks by an order of magnitude. For a compressed neural network, a multitude of inference frameworks have been designed to maximize the performance of the target hardware. While we find mature support for quantized neural networks in production frameworks such as OpenVINO and MNN, support for pruned sparse neural networks… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 34 publications
0
1
0
Order By: Relevance
“…Hardware Support For Sparsity. Several works [220,310] propose to leverage existing generalpurpose CPUs/GPUs to support sparsity in pruned NN models. Others propose domain-specific hardware accelerators which bring much high efficiency at the cost of longer design cycle and larger design automation burden [205,295,297].…”
Section: Hardware Systemmentioning
confidence: 99%
“…Hardware Support For Sparsity. Several works [220,310] propose to leverage existing generalpurpose CPUs/GPUs to support sparsity in pruned NN models. Others propose domain-specific hardware accelerators which bring much high efficiency at the cost of longer design cycle and larger design automation burden [205,295,297].…”
Section: Hardware Systemmentioning
confidence: 99%