2021
DOI: 10.48550/arxiv.2112.00265
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Training BatchNorm Only in Neural Architecture Search and Beyond

Abstract: This work investigates the usage of batch normalization in neural architecture search (NAS). Specifically, Frankle et al. [22] find that training BatchNorm only can achieve nontrivial performance. Furthermore, Chen et al. [9] claim that training BatchNorm only can speed up the training of the one-shot NAS supernet over ten times. Critically, there is no effort to understand 1) why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time, and 2) what is the differe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 44 publications
0
1
0
Order By: Relevance
“…However, computational efficiency is critical in real-world scenarios, where the executed computation is translated into power consumption or carbon emission. Many works have tried on reducing the computational cost of CNNs via neural architecture search [10,16,25,54,57], knowledge distillation [20,55], dynamic routing [4,13,43,51,56] and pruning [15,18], but how to accelerate the ViT model have been rarely explored.…”
Section: Model Compressionmentioning
confidence: 99%
“…However, computational efficiency is critical in real-world scenarios, where the executed computation is translated into power consumption or carbon emission. Many works have tried on reducing the computational cost of CNNs via neural architecture search [10,16,25,54,57], knowledge distillation [20,55], dynamic routing [4,13,43,51,56] and pruning [15,18], but how to accelerate the ViT model have been rarely explored.…”
Section: Model Compressionmentioning
confidence: 99%