Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Alizadeh, Milad; Tailor, Shyam A.; Zintgraf, Luisa; Amersfoort, Joost van; Farquhar, Sebastian; Lane, Nicholas D.; Gal, Yarin

doi:10.48550/arxiv.2202.08132

Cited by 2 publications

(4 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SNIP (Lee et al, 2018) is one of the pioneering works which aim to find trainable sub-networks without any training. Some following works (Wang et al, 2020a;Tanaka et al, 2020;de Jorge et al, 2020;Alizadeh et al, 2022) aim to propose different metrics to prune networks at initialization. Among them, Synflow (Tanaka et al, 2020), SPP (Lee et al, 2019), andFORCE (de Jorge et al, 2020) try to address the problem of layer collapse during pruning.…”

Section: Neural Network Pruningmentioning

confidence: 99%

Double Dynamic Sparse Training for GANs

Wang¹,

Wu²,

Hovakimyan³

et al. 2023

Preprint

View full text Add to dashboard Cite

The past decade has witnessed a drastic increase in modern deep neural networks (DNNs) size, especially for generative adversarial networks (GANs). Since GANs usually suffer from high computational complexity, researchers have shown an increased interest in applying pruning methods to reduce the training and inference costs of GANs. Among different pruning methods invented for supervised learning, dynamic sparse training (DST) has gained increasing attention recently as it enjoys excellent training efficiency with comparable performance to post-hoc pruning. Hence, applying DST on GANs, where we train a sparse GAN with a fixed parameter count throughout training, seems to be a good candidate for reducing GAN training costs. However, a few challenges, including the degrading training instability, emerge due to the adversarial nature of GANs. Hence, we introduce a quantity called balance ratio (BR) to quantify the balance of the generator and the discriminator. We conduct a series of experiments to show the importance of BR in understanding sparse GAN training. Building upon single dynamic sparse training (SDST), where only the generator is adjusted during training, we propose double dynamic sparse training (DDST) to control the BR during GAN training. Empirically, DDST automatically determines the density of the discriminator and greatly boosts the performance of sparse GANs on multiple datasets.

show abstract

Section: Neural Network Pruningmentioning

confidence: 99%

Double Dynamic Sparse Training for GANs

Wang¹,

Wu²,

Hovakimyan³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Finally, pruning on LeNet-5 and AlexNet can compress the number of parameters to 108 and 17.7 times, respectively, without loss of network performance. N. Lee and M. Alizadeh perform one-shot pruning before the network model is initialized [11,12]. Based on the importance of the weight connection, determine the importance of the weight and remove the weight with low importance.…”

Section: Unstructured Pruningmentioning

confidence: 99%

“…Criterion Method [8,9] weights weights magnitude train, prune and fine-tune [10] weights weights magnitude mask learning [11] weights weights magnitude prune and train [12] weights weights magnitude prune and train [14] filters L1 norm train, prune and fine-tune [15] filters filters magnitude group-LASSO regularization [16] filters magnitude of batchnorm parameters train, prune and fine-tune [17] filters output of the next layer train, prune and fine-tune [18] filters geometric median of common information in filters train, prune and fine-tune [19] filters average rank of feature map train, prune and fine-tune [20] filters channel independence train, prune and fine-tune [21,22] filters L p norm train, prune, and fine-tune…”

Section: Article Structurementioning

confidence: 99%

“…where C F represents the number of filters, K represents the size of the filter, and C I represents the number of the channel. (3) Floating-point operations (FLOPs): Floating-point operations are the number of floating-point operations required in the model, and the calculation method is a formula (12), which can reflect the complexity of the model. Floating-point operations in the model are mainly addition and multiplication operations.…”

Section: Evaluation Indicatorsmentioning

confidence: 99%

See 1 more Smart Citation

A Pruning Method Based on Feature Map Similarity Score

Cui,

Wang,

Yang

et al. 2023

BDCC

View full text Add to dashboard Cite

As the number of layers of deep learning models increases, the number of parameters and computation increases, making it difficult to deploy on edge devices. Pruning has the potential to significantly reduce the number of parameters and computations in a deep learning model. Existing pruning methods frequently require a specific distribution of network parameters to achieve good results when measuring filter importance. As a result, a feature map similarity score-based pruning method is proposed. We calculate the similarity score of each feature map to measure the importance of the filter and guide filter pruning using the similarity between the filter output feature maps to measure the redundancy of the corresponding filter. Pruning experiments on ResNet-56 and ResNet-110 networks on Cifar-10 datasets can compress the model by more than 70% while maintaining a higher compression ratio and accuracy than traditional methods.

show abstract

Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Cited by 2 publications

References 4 publications

Double Dynamic Sparse Training for GANs

Double Dynamic Sparse Training for GANs

A Pruning Method Based on Feature Map Similarity Score

Contact Info

Product

Resources

About