2020
DOI: 10.48550/arxiv.2003.01794
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Mao Ye,
Chengyue Gong,
Lizhen Nie
et al.

Abstract: Recent empirical works show that large deep neural networks are often highly redundant and one can find much smaller subnetworks without a significant drop of accuracy. However, most existing methods of network pruning are empirical and heuristic, leaving it open whether good subnetworks provably exist, how to find them efficiently, and if network pruning can be provably better than direct training using gradient descent. We answer these problems positively by proposing a simple greedy selection approach for f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(19 citation statements)
references
References 25 publications
0
19
0
Order By: Relevance
“…This is comparable to the asymptotic error obtained by directly training a neural network of size n with gradient descent descent, which is also O(n −1 ) following the mean field analysis of Mei et al (2018); Araújo et al (2019); Sirignano & Spiliopoulos (2019). More recently, Ye et al (2020) proposed the first pruning method that achieves a faster O(n −2 ) error rate and is hence provably better than direct training with gradient descent. See Table 1 for a summary on those works.…”
Section: Introductionmentioning
confidence: 60%
See 3 more Smart Citations
“…This is comparable to the asymptotic error obtained by directly training a neural network of size n with gradient descent descent, which is also O(n −1 ) following the mean field analysis of Mei et al (2018); Araújo et al (2019); Sirignano & Spiliopoulos (2019). More recently, Ye et al (2020) proposed the first pruning method that achieves a faster O(n −2 ) error rate and is hence provably better than direct training with gradient descent. See Table 1 for a summary on those works.…”
Section: Introductionmentioning
confidence: 60%
“…No Over-param Deep Net Baykal et al (2019b); Liebenwein et al (2020) O(n −1 ) Baykal et al (2019a); Mussay et al (2020) O(n −1 ) Ye et al (2020) O(n −2 ) × × This paper O(exp(−cn)) Table 1: Overview on theoretical guaranteed pruning methods. Rate above gives how the error due to pruning decays as the size of the pruned network (n) increases.…”
Section: Ratementioning
confidence: 99%
See 2 more Smart Citations
“…Network Pruning [10,8,39,21,24,14,41,17,34,30,38] has been extensively studied in recent years to reduce the model size and improve the inference efficiency of deep neural networks. Since it is a widely-recognized property that modern neural networks are always overparameterized, pruning methods are developed to remove unimportant parameters in the fully trained dense networks to alleviate such redundancy.…”
Section: Network Pruningmentioning
confidence: 99%