To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Qin, Qing; Ren, Jie; Yu, Jialong; Wang, Hai; Gao, Ling; Zheng, Jing; Feng, Yansong; Fang, Jianbin; Wang, Zheng

doi:10.1109/bdcloud.2018.00110

Cited by 22 publications

(17 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Deep learning techniques have shown astonishing success in various tasks, including object detection [21,31,71,76]. Efforts have been devoted to accelerating the inference of deep learning models on mobile devices through model compression techniques [17,36,56,69,76]. Actor leverages the recent advances in deep inference acceleration to speed up the DNN model execution time on the remote server to reduce the latency.…”

Section: Discussion and Future Workmentioning

confidence: 99%

“…Existing work on computation offloading for mobile computer vision has considered a range of techniques. These include trading DNN accuracy for lower latency [48,69], offloading the entire [16,45] or part of the object detection task [52,119] to the cloud. Although these approaches were effective for the traditional high definition (HD) video format with a resolution of up to 1,920 × 1,080, they would require over 300ms offloading latency even using a high-speed 5G network when processing a 4K video, and are thus ill-suited for emerging 4K MAR.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive Computation Offloading for Mobile Augmented Reality

Ren¹,

Gao²,

Wang³

et al. 2021

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Self Cite

View full text Add to dashboard Cite

Augmented reality (AR) underpins many emerging mobile applications, but it increasingly requires more computation power for better machine understanding and user experience. While computation offloading promises a solution for high-quality and interactive mobile AR, existing methods work best for high-definition videos but cannot meet the real-time requirement for emerging 4K videos due to the long uploading latency. We introduce ACTOR, a novel computation-offloading framework for 4K mobile AR. To reduce the uploading latency, ACTOR dynamically and judiciously downscales the mobile video feed to be sent to the remote server. On the server-side, it leverages image super-resolution technology to scale back the received video so that high-quality object detection, tracking and rendering can be performed on the full 4K resolution. ACTOR employs machine learning to predict which of the downscaling resolutions and super-resolution configurations should be used, by taking into account the video content, server processing delay, and user expected latency. We evaluate ACTOR by applying it to over 2,000 4K video clips across two typical WiFi network settings. Extensive experimental results show that ACTOR consistently and significantly outperforms competitive methods for simultaneously meeting the latency and user-perceived video quality requirements.

show abstract

Section: Discussion and Future Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Adaptive Computation Offloading for Mobile Augmented Reality

Ren¹,

Gao²,

Wang³

et al. 2021

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Further, we categorized the DNN compression using sparse representation into three sub-categories, i.e., quantization, multiplexing, and weight sharing. Section IVpresents a detailed description of the sparse representation techniques in the existing literature [35], [37], [38], [43], [45], [48]- [52] on DNN compression.…”

Section: B Sparse Representationmentioning

confidence: 99%

“…Channel pruning [25]- [29], [83] Filter pruning [30]- [34] Connection pruning [35]- [43], [83] Layer pruning [43]- [45] Sparse representation Quantization [35], [38], [45], [48] Multiplexing [37], [49], [50] Weight sharing [35], [37], [43], [51], [52] Bits precision Estimation using integer [48], [54] Low bits representation [55] Binarization [41], [56]- [58] Knowledge distillation…”

Section: Network Pruningmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions

Mishra,

Gupta,

Dutta

2020

Preprint

View full text Add to dashboard Cite

Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability. This high order performance leads to significant incorporation of DNN models in different Internet of Things (IoT) applications in the past decade. However, the colossal requirement of computation, energy, and storage of DNN models make their deployment prohibitive on resource constraint IoT devices. Therefore, several compression techniques were proposed in recent years for reducing the storage and computation requirements of the DNN model. These techniques on DNN compression have utilized a different perspective for compressing DNN with minimal accuracy compromise. It encourages us to make a comprehensive overview of the DNN compression techniques. In this paper, we present a comprehensive review of existing literature on compressing DNN model that reduces both storage and computation requirements. We divide the existing approaches into five broad categories, i.e., network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous, based upon the mechanism incorporated for compressing the DNN model. The paper also discussed the challenges associated with each category of DNN compression techniques. Finally, we provide a quick summary of existing work under each category with the future direction in DNN compression.

show abstract

Noise processing and multitask learning for far‐field dialect classification

Wang

et al. 2022

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

SummaryDeep learning has made great achievements in the field of speech recognition. With the popularization of embedded devices such as intelligent speaker and the demand for dialect interaction scenes, it poses great challenges to far‐field speech recognition and dialect language recognition. In order to solve the dialect language recognition of embedded devices in far‐field speech recognition, we propose a deep learning neural network model with multitask learning. First, the audio is passed through the end‐to‐end noise reduction model to improve the effect of audio recognition. Then we define dialect recognition as the main task and dialect area as the auxiliary task, using the multitask learning method to improve the accuracy of dialect classification. The experimental results show that the end‐to‐end noise reduction model can improve the accuracy of audio recognition, and the best effect can be 7.54% higher than the baseline, and the accuracy of dialect language recognition can be improved by about 5% through multi task learning model.

show abstract

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Cited by 22 publications

References 12 publications

Adaptive Computation Offloading for Mobile Augmented Reality

Adaptive Computation Offloading for Mobile Augmented Reality

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions

Noise processing and multitask learning for far‐field dialect classification

Contact Info

Product

Resources

About