Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Ye, Mao; Gong, Chengyue; Nie, Lizhen; Zhou, Denny; Klivans, Adam; Liu, Qiang

doi:10.48550/arxiv.2003.01794

Cited by 6 publications

(19 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is comparable to the asymptotic error obtained by directly training a neural network of size n with gradient descent descent, which is also O(n −1 ) following the mean field analysis of Mei et al (2018); Araújo et al (2019); Sirignano & Spiliopoulos (2019). More recently, Ye et al (2020) proposed the first pruning method that achieves a faster O(n −2 ) error rate and is hence provably better than direct training with gradient descent. See Table 1 for a summary on those works.…”

Section: Introductionmentioning

confidence: 60%

“…No Over-param Deep Net Baykal et al (2019b); Liebenwein et al (2020) O(n −1 ) Baykal et al (2019a); Mussay et al (2020) O(n −1 ) Ye et al (2020) O(n −2 ) × × This paper O(exp(−cn)) Table 1: Overview on theoretical guaranteed pruning methods. Rate above gives how the error due to pruning decays as the size of the pruned network (n) increases.…”

Section: Ratementioning

confidence: 99%

“…Recently, a line of works on network pruning with theoretical guarantees have emerged, including sensitivity-based methods (Baykal et al, 2019b;Liebenwein et al, 2020), coreset-based methods (Baykal et al, 2019a;Mussay et al, 2020), greedy forward selection (Ye et al, 2020). Both the sensitivity-based and coreset-based methods prune the network by sampling and bound the error caused pruning via concentration inequalities.…”

Section: Introductionmentioning

confidence: 99%

“…However, the analysis of Ye et al (2020) only applies to two-layer networks and requires the original network to be sufficiently over-parameterized. In this paper, we proposed a new greedy optimization based pruning method, which learns sub-networks of size n with a significantly smaller O(exp(−cn))…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

Liu

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Despite the great success of deep learning, recent works show that large deep neural networks are often highly redundant and can be significantly reduced in size. However, the theoretical question of how much we can prune a neural network given a specified tolerance of accuracy drop is still open. This paper provides one answer to this question by proposing a greedy optimization based pruning method. The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate w.r.t. the size of the pruned network, under weak assumptions that apply for most practical settings. Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.

show abstract

Section: Introductionmentioning

confidence: 60%

Section: Ratementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

Liu

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Network Pruning [10,8,39,21,24,14,41,17,34,30,38] has been extensively studied in recent years to reduce the model size and improve the inference efficiency of deep neural networks. Since it is a widely-recognized property that modern neural networks are always overparameterized, pruning methods are developed to remove unimportant parameters in the fully trained dense networks to alleviate such redundancy.…”

Section: Network Pruningmentioning

confidence: 99%

Effective Sparsification of Neural Networks with Global Sparsity Constraint

Zhou¹,

Zhang²,

Xu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Weight pruning is an effective technique to reduce the model size and inference time for deep neural networks in real-world deployments. However, since magnitudes and relative importance of weights are very different for different layers of a neural network, existing methods rely on either manual tuning or handcrafted heuristic rules to find appropriate pruning rates individually for each layer. This approach generally leads to suboptimal performance. In this paper, by directly working on the probability space, we propose an effective network sparsification method called probabilistic masking (ProbMask), which solves a natural sparsification formulation under global sparsity constraint.The key idea is to use probability as a global criterion for all layers to measure the weight importance. An appealing feature of ProbMask is that the amounts of weight redundancy can be learned automatically via our constraint and thus we avoid the problem of tuning pruning rates individually for different layers in a network. Extensive experimental results on CIFAR-10/100 and ImageNet demonstrate that our method is highly effective, and can outperform previous state-of-the-art methods by a significant margin, especially in the high pruning rate situation. Notably, the gap of Top-1 accuracy between our ProbMask and existing methods can be up to 10%. As a by-product, we show ProbMask is also highly effective in identifying supermasks, which are subnetworks with high performance in a randomly weighted dense neural network.

show abstract