Memory-Reduced Network Stacking for Edge-Level CNN Architecture With Structured Weight Pruning

Moon, Seungsik; Byun, Younghoon; Park, Jong-Min; Lee, Sunggu; Lee, Young‐Joo

doi:10.1109/jetcas.2019.2952137

Cited by 19 publications

(4 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Table 1, three baseline DCNN architectures are used for evaluations of three different compression techniques described in Section II; quantization(T 0 ), pruning(T 1 ), and channel scaling(T 2 ), where we set f 0 = 10, f 1 = 10, and f 2 = 4, respectively. Based on the prior researches [16], [20], [21], the maximum compression levels from three different optimization approaches are achieved by adopting 8-bit quantization, 99% weight pruning, and 0.25-scaled channels.…”

Section: Resultsmentioning

confidence: 99%

Rapid Design Space Exploration of Near-Optimal Memory-Reduced DCNN Architecture Using Multiple Model Compression Techniques

Byun

Lee

2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

In spite of the attractive accuracy, it is hard to use a deep convolutional neural network (DCNN) directly at the resource-limited devices due to the energy-consuming memory overheads, and thus the aggressive compression schemes are essentially utilized in practice to reduce the DCNN model size. As the recent methods have been individually developed, however, it is inevitable to exhaustively find the optimal combination of different approaches, requiring an enormous amount of search time. Given the complex baseline network, in this work, we introduce a rapid and systematic way to find the nearoptimal memory-reduced DCNN option using multiple compression schemes together. We first precisely observe the accuracy-size trade-off of each method and make a novel interpolating scheme to speculate the accuracy of an arbitrary combination. We then present an iterative search algorithm to minimize the number of network evaluations for finding the memory-efficient DCNN structure satisfying the required accuracy. Experimental results reveal that our framework provides a similar compression level to the naive full-search strategy with three popular optimization methods while saving the search time by 7.35 times.

show abstract

Section: Resultsmentioning

confidence: 99%

Rapid Design Space Exploration of Near-Optimal Memory-Reduced DCNN Architecture Using Multiple Model Compression Techniques

Byun

Lee

2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…1/3 of weights are pruned for ResNet and AlexNet models when tested on CIFAR dataset with an accuracy drop of around 0.8% in 100 epochs. S. Moon et al [14] have proposed a novel memory-reduced multiple accuracy pruning method. This method is combination of multiple CNN optimization techniques.…”

Section: A Network Compression Using Pruning Methodsmentioning

confidence: 99%

Resource-Restricted Environments Based Memory-Efficient Compressed Convolutional Neural Network Model for Image-Level Object Classification

et al. 2023

View full text Add to dashboard Cite

In the past decade, Convolutional Neural Networks (CNNs) have achieved tremendous success in solving complex classification problems. CNN architectures require an excessive number of computations to achieve high accuracy. However, these models are deficient due to the heavy cost of storage and energy, which prohibits the application of CNNs to resource-constrained edge-devices. Hence, developing aggressive optimization schemes for efficient deployment of CNNs on edge devices has become the most important requirement. To find the optimal approach, we present a resource-limited environment based memoryefficient network compression model for image-level object classification. The main aim is to compress CNN architecture by achieving low computational cost and memory requirements without dropping system's accuracy. To achieve the said goal, we propose a network compression strategy, that works in a collaborative manner, where Soft Filter Pruning is first applied to reduce the computational cost of the model. In the next step, the model is divided into No-Pruning Layers (NP-Layers) and Pruning Layers (P-Layers). Incremental Quantization is applied to P-Layers due to irregular weights distribution, while for NP-Layers, we propose a novel Optimized Quantization algorithm for the quantization of weights up to optimal levels obtained from the Optimizer. This scheme is designed to achieve the best trade-off between compression ratio and accuracy of the model. Our proposed system is validated for image-level object classification on LeNet-5, CIFARquick, and VGG-16 networks using MNIST, CIFAR-10, and ImageNet ILSVRC2012 datasets respectively. We have achieved high compression ratio with negligible accuracy drop, outperforming the state-of-the-art methods.

show abstract

“…Some studies ( Zhang et al, 2018 ; Moon et al, 2019 ) have focused on weight pruning, which is a common type of unstructured pruning and involves removing individual weights or neurons from the network without any constraints on their location or connectivity. Weight pruning is very effective in reducing the number of parameters and computations in a network, as it allows for fine-grained control over the sparsity level and can achieve very high compression ratios.…”

Section: Related Workmentioning

confidence: 99%

GAT TransPruning: progressive channel pruning strategy combining graph attention network and transformer

Lin,

Wang,

Lin

2024

PeerJ Computer Science

View full text Add to dashboard Cite

Recently, large-scale artificial intelligence models with billions of parameters have achieved good results in experiments, but their practical deployment on edge computing platforms is often subject to many constraints because of their resource requirements. These models require powerful computing platforms with a high memory capacity to store and process the numerous parameters and activations, which makes it challenging to deploy these large-scale models directly. Therefore, model compression techniques are crucial role in making these models more practical and accessible. In this article, a progressive channel pruning strategy combining graph attention network and transformer, namely GAT TransPruning, is proposed, which uses the graph attention networks (GAT) and the attention of transformer mechanism to determine the channel-to-channel relationship in large networks. This approach ensures that the network maintains its critical functional connections and optimizes the trade-off between model size and performance. In this study, VGG-16, VGG-19, ResNet-18, ResNet-34, and ResNet-50 are used as large-scale network models with the CIFAR-10 and CIFAR-100 datasets for verification and quantitative analysis of the proposed progressive channel pruning strategy. The experimental results reveal that the accuracy rate only drops by 6.58% when the channel pruning rate is 89% for VGG-19/CIFAR-100. In addition, the lightweight model inference speed is 9.10 times faster than that of the original large model. In comparison with the traditional channel pruning schemes, the proposed progressive channel pruning strategy based on the GAT and Transformer cannot only cut out the insignificant weight channels and effectively reduce the model size, but also ensure that the performance drop rate of its lightweight model is still the smallest even under high pruning ratio.

show abstract

Memory-Reduced Network Stacking for Edge-Level CNN Architecture With Structured Weight Pruning

Cited by 19 publications

References 25 publications

Rapid Design Space Exploration of Near-Optimal Memory-Reduced DCNN Architecture Using Multiple Model Compression Techniques

Rapid Design Space Exploration of Near-Optimal Memory-Reduced DCNN Architecture Using Multiple Model Compression Techniques

Resource-Restricted Environments Based Memory-Efficient Compressed Convolutional Neural Network Model for Image-Level Object Classification

GAT TransPruning: progressive channel pruning strategy combining graph attention network and transformer

Contact Info

Product

Resources

About