2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01197
|View full text |Cite
|
Sign up to set email alerts
|

CHEX: CHannel EXploration for CNN Model Compression

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(31 citation statements)
references
References 40 publications
0
31
0
Order By: Relevance
“…For spatiotemporal learning, BEVT [75] and VideoMAE [70,25] can be seen as extensions of BeiT and MAE, respectively. Recent works also indicate that CLIP features provide good guidance for mask modeling [78,33,60,59,84], but all of them actually perform worse than CLIP itself with elaborate fine-tuning [21]. In contrast, we demonstrate that in the video domain, our model with CLIP supervision clearly outperforms the teacher.…”
Section: Related Workmentioning
confidence: 56%
See 1 more Smart Citation
“…For spatiotemporal learning, BEVT [75] and VideoMAE [70,25] can be seen as extensions of BeiT and MAE, respectively. Recent works also indicate that CLIP features provide good guidance for mask modeling [78,33,60,59,84], but all of them actually perform worse than CLIP itself with elaborate fine-tuning [21]. In contrast, we demonstrate that in the video domain, our model with CLIP supervision clearly outperforms the teacher.…”
Section: Related Workmentioning
confidence: 56%
“…However, the aggressive random masking may only retain the background tokens, which contain insignificant information and hinder the teacher's knowledge transfer. To enhance target effectiveness, we apply the semantic masking [33] frame by frame, where the tokens with important clues are maintained at higher probabilities. Specifically, given the class token z cls ∈R 1×C and the spatial tokens Z∈R L×C in the tth frame of CLIP-ViT (L=H×W is the token number and C is the token dimension), we calculate the attention score in the last self-attention [23] layer:…”
Section: Unmasked Teachermentioning
confidence: 99%
“…Compared with the stateof-the-art automatic low-rank compression method ALDS, HALOC enjoys 1.4% higher top-1 accuracy with much lower computational costs. Compared with the state-of-theart pruning work CHEX (Hou et al 2022), HALOC also achieves 0.54% accuracy increase with much fewer FLOPs. Besides, our approach also shows impressive performance for compressing MobileNetV2, a task that is conventionally challenging for low-rank compression approach.…”
Section: Imagenet Resultsmentioning
confidence: 99%
“…Shen et al [ 42 ] pruned channels globally based on magnitude and gradient criteria. Unlike pruning-only methods, Hou et al [ 43 ] proposed a pruning-and-regrowing method to avoid removing important channels. These methods can compress a network while ensuring high performance.…”
Section: Related Workmentioning
confidence: 99%