2023
DOI: 10.48550/arxiv.2303.04185
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient-Free Structured Pruning with Unlabeled Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…However, as the size and complexity of LLMs rapidly increase [29,30,31], this conventional approach becomes impractical and costly, prompting the need for retraining-free compression techniques. Recent developments in this area have primarily centered around quantization [32,33,34] and have expanded to include pruning methods [13,15,14] that eliminate the need for retraining. In this paper, our work targets enhancing the performance of the retraining-free pruning paradigm, which can reduce the model size, lower the memory consumption, accelerate the inference, and be orthogonal and compatible with quantization for further compression simultaneously.…”
Section: Network Pruning For Language Modelsmentioning
confidence: 99%
See 4 more Smart Citations
“…However, as the size and complexity of LLMs rapidly increase [29,30,31], this conventional approach becomes impractical and costly, prompting the need for retraining-free compression techniques. Recent developments in this area have primarily centered around quantization [32,33,34] and have expanded to include pruning methods [13,15,14] that eliminate the need for retraining. In this paper, our work targets enhancing the performance of the retraining-free pruning paradigm, which can reduce the model size, lower the memory consumption, accelerate the inference, and be orthogonal and compatible with quantization for further compression simultaneously.…”
Section: Network Pruning For Language Modelsmentioning
confidence: 99%
“…In the context of network pruning, retraining-free approaches such as those proposed by [35] seek to mitigate output distortion instead of retraining to maintain as much of the model's original capability as possible. Mask-Tuning, introduced by [13] and adopted by KCM [15], involves rescaling the mask as a reconstruction technique. While it tests the limits of encoder-based models, it struggles to maintain performance at high pruning ratios.…”
Section: Distortion Reconstruction For Retraining-free Pruningmentioning
confidence: 99%
See 3 more Smart Citations