2018 IEEE Intl Conf on Parallel &Amp; Distributed Processing With Applications, Ubiquitous Computing &Amp; Communications, Big 2018
DOI: 10.1109/bdcloud.2018.00110
|View full text |Cite
|
Sign up to set email alerts
|

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Abstract: The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resourceconstrained computing devices. Model compression techniques can address the computation issue of deep inference on embedded devices. This technique is highly attractive, as it does not rely on specialized hardware, or computation-offloading that is often infeasible due to privacy concerns or high latency. However, it remains unclear how model co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 12 publications
0
17
0
Order By: Relevance
“…Deep learning techniques have shown astonishing success in various tasks, including object detection [21,31,71,76]. Efforts have been devoted to accelerating the inference of deep learning models on mobile devices through model compression techniques [17,36,56,69,76]. Actor leverages the recent advances in deep inference acceleration to speed up the DNN model execution time on the remote server to reduce the latency.…”
Section: Discussion and Future Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Deep learning techniques have shown astonishing success in various tasks, including object detection [21,31,71,76]. Efforts have been devoted to accelerating the inference of deep learning models on mobile devices through model compression techniques [17,36,56,69,76]. Actor leverages the recent advances in deep inference acceleration to speed up the DNN model execution time on the remote server to reduce the latency.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Existing work on computation offloading for mobile computer vision has considered a range of techniques. These include trading DNN accuracy for lower latency [48,69], offloading the entire [16,45] or part of the object detection task [52,119] to the cloud. Although these approaches were effective for the traditional high definition (HD) video format with a resolution of up to 1,920 × 1,080, they would require over 300ms offloading latency even using a high-speed 5G network when processing a 4K video, and are thus ill-suited for emerging 4K MAR.…”
Section: Introductionmentioning
confidence: 99%
“…Further, we categorized the DNN compression using sparse representation into three sub-categories, i.e., quantization, multiplexing, and weight sharing. Section IVpresents a detailed description of the sparse representation techniques in the existing literature [35], [37], [38], [43], [45], [48]- [52] on DNN compression.…”
Section: B Sparse Representationmentioning
confidence: 99%
“…Channel pruning [25]- [29], [83] Filter pruning [30]- [34] Connection pruning [35]- [43], [83] Layer pruning [43]- [45] Sparse representation Quantization [35], [38], [45], [48] Multiplexing [37], [49], [50] Weight sharing [35], [37], [43], [51], [52] Bits precision Estimation using integer [48], [54] Low bits representation [55] Binarization [41], [56]- [58] Knowledge distillation…”
Section: Network Pruningmentioning
confidence: 99%
See 1 more Smart Citation