PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Niu, Wei; Ma, Xiaolong; Lin, Sheng; Wang, Shihao; Qian, Xuehai; Lin, Xue; Wang, Yanzhi; Ren, Bin

doi:10.1145/3373376.3378534

Cited by 208 publications

(129 citation statements)

References 51 publications

Supporting

Mentioning

129

Contrasting

Order By: Relevance

“…Especially, deep learning models in embedded devices such as mobile or IoT devices require efficient processing. The examples are the face recognition model on a single-board computer [44], real-time DNN model in mobile devices [45], and emotion recognition in Rasberry Pi [46]. As a result, the proposed strategy, i.e., pre-processing for excluding unnecessary parts with a negligible cost, which incur significant overhead in the deep learning process, can be adapted and investigated to the deep-learning based methods for the other problems.…”

Section: Discussionmentioning

confidence: 99%

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Lee

Kwon

2020

Applied Sciences

View full text Add to dashboard Cite

In this paper, we propose a preprocessing strategy for denoising of speech data based on speech segment detection. A design of computationally efficient speech denoising is necessary to develop a scalable method for large-scale data sets. Furthermore, it becomes more important as the deep learning-based methods have been developed because they require significant costs while showing high performance in general. The basic idea of the proposed method is using the speech segment detection so as to exclude non-speech segments before denoising. The speech segmentation detection can exclude non-speech segments with a negligible cost, which will be removed in denoising process with a much higher cost, while maintaining the accuracy of denoising. First, we devise a framework to choose the best preprocessing method for denoising based on the speech segment detection for a target environment. For this, we speculate the environments for denoising using different levels of signal-to-noise ratio (SNR) and multiple evaluation metrics. The framework finds the best speech segment detection method tailored to a target environment according to the performance evaluation of speech segment detection methods. Next, we investigate the accuracy of the speech segment detection methods extensively. We conduct the performance evaluation of five speech segment detection methods with different levels of SNRs and evaluation metrics. Especially, we show that we can adjust the accuracy between the precision and recall of each method by controlling a parameter. Finally, we incorporate the best speech segment detection method for a target environment into a denoising process. Through extensive experiments, we show that the accuracy of the proposed scheme is comparable to or even better than that of Wavenet-based denoising, which is one of recent advanced denoising methods based on deep neural networks, in terms of multiple evaluation metrics of denoising, i.e., SNR, STOI, and PESQ, while it can reduce the denoising time of the Wavenet-based denoising by approximately 40–50% according to the used speech segment detection method.

show abstract

Section: Discussionmentioning

confidence: 99%

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Lee

Kwon

2020

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Fined-Grained Pattern-Based Pruning. The state-of-the-art pruning work [47] proposes a fine-grained pattern-based pruning scheme, which generates an intermediate sparsity type between non-structured pruning and structured pruning. They prune a fixed number of weights in each convolution kernel (e.g., pruning 5 weights out of 9 weights in a 3×3 convolution kernel), and make the remaining weights to be concentrated in a certain area to form specific kernel patterns (called pattern sparsity), as shown in Figure 1 (left).…”

Section: Background 21 Dnn Model Pruningmentioning

confidence: 99%

“…Recent works [38,47] have applied pattern-based pruning techniques for improving inference efficiency. However, these inferencefocused strategies will pose several challenges to reach our three optimization objectives.…”

Section: Challenges Of Pattern-based Pruning In Trainingmentioning

confidence: 99%

ClickTrain

Zhang

Yuan

Niu

et al. 2021

Proceedings of the ACM International Conference on Supercomputing

Self Cite

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) are becoming increasingly deeper, wider, and non-linear because of the growing demand on prediction accuracy and analysis quality. The wide and deep CNNs, however, require a large amount of computing resources and processing time. Many previous works have studied model pruning to improve inference performance, but little work has been done for effectively reducing training cost. In this paper, we propose Click-Train: an efficient and accurate end-to-end training and pruning framework for CNNs. Different from the existing pruning-duringtraining work, ClickTrain provides higher model accuracy and compression ratio via fine-grained architecture-preserving pruning. By leveraging pattern-based pruning with our proposed novel accurate weight importance estimation, dynamic pattern generation and selection, and compiler-assisted computation optimizations, Click-Train generates highly accurate and fast pruned CNN models for direct deployment without any extra time overhead, compared with the baseline training. ClickTrain also reduces the end-to-end time cost of the pruning-after-training method by up to 2.3× with comparable accuracy and compression ratio. Moreover, compared with the state-of-the-art pruning-during-training approach, ClickTrain provides significant improvements both accuracy and compression ratio on the tested CNN models and datasets, under similar limited training time. CCS CONCEPTS• Computing methodologies → Neural networks.

show abstract

“…Given an unpruned CNN model, our system first performs non-structured weight pruning with the Alternating Direction Method of Multipliers (ADMM) algorithm. Previous works have shown that ADMM-based algorithms can achieve the state-of-the-art compression ratio for CNNs with little accuracy loss [33,46]. Readers can refer to [46] for more details.…”

Section: Performance Challenges With Cnn Pruningmentioning

confidence: 99%

“…Another approach is to design more hardware-amenable pruning strategies [8,29]. For example, a hybrid strategy by combining structured and non-structured pruning can achieve good accuracy while maintaining some regular patterns in the pruned model for efficient hardware processing [29,33]. These works, however, lack a careful examination of the code optimization opportunities, resulting in restricted pruning choices and sub-optimal performance.…”

mentioning

confidence: 99%

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Rumi

Wang

et al. 2020

Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

Self Cite

View full text Add to dashboard Cite

Weight pruning is a popular technique to reduce the size and computation complexity of the Convolutional Neural Networks (CNNs). Despite its success in reducing the model size, weight pruning has brought limited benefit to the CNN inference performance, due to the irregularity introduced in the sparse convolution operations. In this work, we aim to improve the performance of sparse convolutions on GPUs by mitigating the irregularity. We find that the existing performance optimization techniques for sparse matrix computations fail to accelerate sparse convolutions, and we observe that the main performance bottleneck is caused by the heavy control-flow instructions. Based on the observation, we proposed a new GEMM-based implementation of sparse convolutions. Our main idea is to extract dense blocks of non-zeros in the sparse convolution kernels, and use dense matrix-matrix multiplication for these dense blocks to achieve high throughput. For cases where many non-zero weights cannot be grouped into dense blocks, we propose a performance-aware re-pruning strategy that removes the least important weights in the sparse kernels to further improve the throughput. The experimental results with five real-world pruned CNN models show that our techniques can significantly improve the layer-wise performance of sparse convolution operations as well as the end-to-end performance of CNN inference. CCS CONCEPTS • Computing methodologies → Neural networks; • Software and its engineering → Source code generation;

show abstract

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Cited by 208 publications

References 51 publications

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

ClickTrain

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Contact Info

Product

Resources

About