FreezeNet: Full Performance by Reduced Storage Costs

Wimmer, Paul; Mehnert, Jens; Condurache, Alexandru Paul

doi:10.1007/978-3-030-69544-6_41

Cited by 10 publications

(44 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another approach, which is similar to [93], proposed by Rosenfeld and Tsotsos [57], Wimmer et al [58], Sung et al [59], uses frozen weights on top of the trainable ones. The resulting transformation is given by…”

Section: Freezing Parametersmentioning

confidence: 99%

“…Freezing a DNN means that only parts of the network are trained, whereas the remaining ones are frozen at their initial/pre-trained values [55][56][57][58][59]. This leads to faster convergence of the networks [55] and reduced communication costs for distributed training [59].…”

Section: Introductionmentioning

confidence: 99%

“…This leads to faster convergence of the networks [55] and reduced communication costs for distributed training [59]. Furthermore, freezing reduces memory requirements if the networks are initialized with pseudorandom numbers [58].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey

Wimmer¹,

Mehnert²,

Condurache³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. A big part of these costs is caused by training the network. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. Thus, compressing networks also at training time while maintaining a high performance is an important research topic. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training. Most of the introduced methods set network parameters to zero which is called pruning. The presented pruning approaches are categorized into pruning at initialization, lottery tickets and dynamic sparse training. Moreover, we discuss methods that freeze parts of a network at its random initialization. By freezing weights, the number of trainable parameters is shrunken which reduces gradient computations and the dimensionality of the model's optimization space. In this survey we first propose dimensionality reduced training as an underlying mathematical model that covers pruning and freezing during training. Afterwards, we present and discuss different dimensionality reduced training methods.

show abstract

Section: Freezing Parametersmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey

Wimmer¹,

Mehnert²,

Condurache³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…To reduce the theoretical training FLOPs, a new trend of exploring sparsity at an early stage [10,11,38,39,40] has emerged to embrace the promising sparse training paradigm. SNIP [10] finds the sparse masks based on the saliency score of each weight that is obtained after training the dense model for only a few iterations.…”

Section: Pruning At An Early Stagementioning

confidence: 99%

Effective Model Sparsification by Scheduled Grow-and-Prune Methods

Ma¹,

Qin²,

Sun³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep neural networks (DNNs) are effective in solving many real-world problems. Larger DNN models usually exhibit better quality (e.g., accuracy) but their excessive computation results in long training and inference time. Model sparsification can reduce the computation and memory cost while maintaining model quality. Most existing sparsification algorithms unidirectionally remove weights, while others randomly or greedily explore a small subset of weights in each layer for pruning. The limitations of these algorithms reduce the level of sparsity. In addition, many algorithms still require pre-trained dense models and thus suffer from large memory footprint. In this paper, we propose a novel scheduled grow-and-prune (GaP) methodology without having to pre-train a dense model. It addresses the shortcomings of the previous work by repeatedly growing a subset of layers to dense and then pruning them back to sparse after some training. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks, such as image classification, objective detection, 3D object part segmentation, and translation. They also outperform other state-of-the-art (SOTA) methods for model sparsification. As an example, a 90% sparse ResNet-50 obtained via GaP achieves 77.9% top-1 accuracy on ImageNet, improving the SOTA results of sparsification algorithms by 1.5%.Early works on weight pruning generally follow a prune-from-dense methodology [2,3,4,5,6,7,8,9], which usually requires 3 phases of training: pre-train a dense model, prune it to sparse, and fine-tune it. In such methodologies, however, the pre-trained dense models consume large memory space and may lead to long training time. In addition, one-shot or iteratively pruning from a well-trained DNN can only remove weights, which lacks the flexibility of growing back weights that are considered unimportant early in the training process but showed to be significant later in training. † Equal contribution.

show abstract

“…The fixed-mask approach [9,10,45,46,47] has been proposed to decouple pruning and training such that after pruning, the sparse model training can be executed on edge devices. SNIP [9] preserves the loss after pruning based on connection sensitivity.…”

Section: Sparse Training With Fixed Sparsity Maskmentioning

confidence: 99%

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Yuan¹,

Ma²,

Niu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure superior accuracy at high sparsity ratios. Different from the existing works for sparse training, this current work reveals the importance of sparsity schemes on the performance of sparse training in terms of accuracy as well as training speed on real edge devices. On top of that, the paper proposes to employ data efficiency for further acceleration of sparse training. Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks in the sparse training process, and therefore can be removed for further training speedup on edge devices. Comparing with state-of-the-art (SOTA) works on accuracy, our MEST increases Top-1 accuracy significantly on ImageNet when using the same unstructured sparsity scheme. Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works. Our codes are publicly available at: https://github.com/boone891214/MEST.Recently, a new trend of exploring sparsity for training acceleration of neural networks has emerged to embrace the promising training-on-the-edge paradigm. The first works in this direction use the pruning-at-initialization approach such as SNIP [9] and GraSP [10] that first obtains a fixed sparse model structure and then follows with a traditional training process. However, the whole process is still computation-and memory-intensive, and therefore not compatible with the end-to-end edge training paradigm. Such a sparse training methodology with the pre-fixed structure also faces the problem of compromised accuracy.

show abstract

FreezeNet: Full Performance by Reduced Storage Costs

Cited by 10 publications

References 17 publications

Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey

Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey

Effective Model Sparsification by Scheduled Grow-and-Prune Methods

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Contact Info

Product

Resources

About