Rare Gems: Finding Lottery Tickets at Initialization

Sreenivasan, Kartik; Sohn, Jy-yong; Yang, Liu; Grinde, Matthew; Nagle, Alliot; Wang, Hongyi; Lee, Kangwook; Papailiopoulos, Dimitris S.

doi:10.48550/arxiv.2202.12002

Cited by 4 publications

(5 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Later, Pensia et al (2020) improved the widening factor to being logarithmic and Sreenivasan et al (2021) proved that with a polylogarithmic widening factor, such a result holds even if the network weight is binary. A follow-up work shows that it is possible to find a subnetwork achieving good performance at the initialization and then fine-tune (Sreenivasan et al, 2022). Our work, on the other hand, analyzes the gradient descent dynamics of a pruned neural network and its generalization after training.…”

Section: Related Workmentioning

confidence: 99%

Pruning Before Training May Improve Generalization, Provably

Yang¹,

Liang²,

Guo³

et al. 2023

Preprint

View full text Add to dashboard Cite

It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network.

show abstract

Section: Related Workmentioning

confidence: 99%

Pruning Before Training May Improve Generalization, Provably

Yang¹,

Liang²,

Guo³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…This paper has inspired a quickly growing body of work that has found that a network contains not one but several tickets [18], which may be connected [19]. Moreover, after it was shown that tickets can be identified early in the training process [20], some methods have succeeded in pruning before training [21]- [30], even before looking at the data, in order to reduce training cost.…”

Section: ) the Weak Lottery Ticket Hypothesismentioning

confidence: 99%

Recurrent Residual Networks Contain Stronger Lottery Tickets

et al. 2023

View full text Add to dashboard Cite

Accurate neural networks can be found just by pruning a randomly initialized overparameterized model, leaving out the need for any weight optimization. The resulting subnetworks are small, sparse, and ternary, making excellent candidates for efficient hardware implementation. However, finding optimal connectivity patterns is an open challenge. Based on the evidence that residual networks may be approximating unrolled shallow recurrent neural networks, we conjecture that they contain better candidate subnetworks at inference time when explicitly transformed into recurrent architectures. This hypothesis is put to the test on image classification tasks, where we find subnetworks within the recurrent models that are more accurate and parameter-efficient than both the ones found within feedforward models and than the full models with learned weights. Furthermore, random recurrent subnetworks are tiny: under a simple compression scheme, ResNet-50 is compressed without a drastic loss in performance to 48.55× less memory size, fitting in under 2 megabytes. Code available at: https://github.com/Lopez-Angel/hidden-fold-networks.

show abstract

“…Here, we discuss possible ways of improving the performance of our method. Note that a recent work (Sreenivasan et al 2022) on pruning discriminative networks found that there are two methods to improve the performance of EP: (1) using global EP (pruning weights by sorting the scores globally) instead of vanilla EP (pruning weights by sorting the scores at each layer), and (2) using gradual pruning (moving from dense regime to sparse regime gradually during pruning) instead of vanilla EP which moves to the sparse regime from the beginning. Inspired by this observation, we expect applying EP with these two variants (global pruning and gradual pruning) in our method has the potential to improve the performance of the SLT in generative models.…”

Section: Factor Analysismentioning

confidence: 99%

“…In this work, we focus on the model pruning techniques that fall into the latter category. Unlike discriminative models where various pruning techniques (LeCun, Denker, and Solla 1989;Hassibi and Stork 1992;Han et al 2015;Frankle and Carbin 2018;Ramanujan et al 2020;Sreenivasan et al 2022) have been actively studied, pruning generative models have not been extensively explored. Moreover, it has been found that naïve application of existing pruning methods (developed for discriminative models) to generative models leads to performance degradation and/or unstable training (Wang et al 2020;Li et al 2021).…”

Section: Introductionmentioning

confidence: 99%

Can We Find Strong Lottery Tickets in Generative Models?

Sangyeop

Jang

Sohn

et al. 2023

AAAI

View full text Add to dashboard Cite

Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algorithms suffer from excessive weight-training costs, performance degradation, limited generalizability, or complicated training. To address these problems, we propose to find a strong lottery ticket via moment-matching scores. Our experimental results show that the discovered subnetwork can perform similarly or better than the trained dense model even when only 10% of the weights remain. To the best of our knowledge, we are the first to show the existence of strong lottery tickets in generative models and provide an algorithm to find it stably. Our code and supplementary materials are publicly available at https://lait-cvlab.github.io/SLT-in-Generative-Models/.

show abstract

Rare Gems: Finding Lottery Tickets at Initialization

Cited by 4 publications

References 18 publications

Pruning Before Training May Improve Generalization, Provably

Pruning Before Training May Improve Generalization, Provably

Recurrent Residual Networks Contain Stronger Lottery Tickets

Can We Find Strong Lottery Tickets in Generative Models?

Contact Info

Product

Resources

About