2022
DOI: 10.1609/aaai.v36i3.20222
|View full text |Cite
|
Sign up to set email alerts
|

Width & Depth Pruning for Vision Transformers

Abstract: Transformer models have demonstrated their promising potential and achieved excellent performance on a series of computer vision tasks. However, the huge computational cost of vision transformers hinders their deployment and application to edge devices. Recent works have proposed to find and remove the unimportant units of vision transformers. Despite achieving remarkable results, these methods take one dimension of network width into consideration and ignore network depth, which is another important dimension … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 53 publications
(23 citation statements)
references
References 26 publications
0
23
0
Order By: Relevance
“…The first group of methods focuses on reducing the complexity of the attention module by imposing the locality of input images adaptively [36], [37]. The second group applies pruning methods to remove unimportant components (e.g., partial channels) [38] or inputs (i.e., patches) [39] to a ViT model. The third group of methods uses neural architecture search (NAS) techniques to design efficient ViT models by optimizing architectural hyperparameters, such as channels and depth [40], [41].…”
Section: Vision Transformersmentioning
confidence: 99%
“…The first group of methods focuses on reducing the complexity of the attention module by imposing the locality of input images adaptively [36], [37]. The second group applies pruning methods to remove unimportant components (e.g., partial channels) [38] or inputs (i.e., patches) [39] to a ViT model. The third group of methods uses neural architecture search (NAS) techniques to design efficient ViT models by optimizing architectural hyperparameters, such as channels and depth [40], [41].…”
Section: Vision Transformersmentioning
confidence: 99%
“…Self-Attention Self-Attention Previous works [6,7,8,9] have addressed the redundancy of input image patches in vision transformers and have attempted to optimize patch slimming to enhance computation efficiency. Most of these approaches [7,8,9] have utilized the same paradigm, classifying image patches in each layer into two classes, one for keeping and another for discarding, using an attentive score.…”
Section: Self-attentionmentioning
confidence: 99%
“…To compress and accelerate transformer models, a variety of techniques naturally emerge. Popular approaches include weight quantization [34], knowledge distillation [31], filter compression [27], and model pruning [24]. Among them, model pruning especially structured pruning has gained considerable interest that removes the least important parameters in pre-trained models in a hardware-friendly manner, which is thus the focus of our paper.…”
Section: Introductionmentioning
confidence: 99%
“…(1) Criterion-based pruning resorts to preserving the most important weights/attentions by employing pre-defined criteria, e.g., the L1/L2 norm [23], or activation values [6]. (2) Training-based pruning retrains models with hand-crafted sparse regularizations [36] or resource constraints [31,32].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation