2020
DOI: 10.1007/978-3-030-58580-8_36
|View full text |Cite
|
Sign up to set email alerts
|

Circumventing Outliers of AutoAugment with Knowledge Distillation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 41 publications
(27 citation statements)
references
References 32 publications
0
27
0
Order By: Relevance
“…Vision transformers have also been applied in medical image classification Dai et al (2021) . Furthermore, knowledge distillation Hinton et al (2015) ; Wei et al (2020) from the model based on CNN is shown to be effective in improving the performance of the vision transformer Touvron et al (2020) . Instead of simply regarding image patches as tokens, Yuan et al proposed a tokens-to-token (T2T) method to better tokenize patches with the consideration of image structure Yuan et al (2021) .…”
Section: Related Workmentioning
confidence: 99%
“…Vision transformers have also been applied in medical image classification Dai et al (2021) . Furthermore, knowledge distillation Hinton et al (2015) ; Wei et al (2020) from the model based on CNN is shown to be effective in improving the performance of the vision transformer Touvron et al (2020) . Instead of simply regarding image patches as tokens, Yuan et al proposed a tokens-to-token (T2T) method to better tokenize patches with the consideration of image structure Yuan et al (2021) .…”
Section: Related Workmentioning
confidence: 99%
“…By sampling such transformations, DHA can pay more attention to more aggressive DA strategies and increase model robustness against difficult samples (Zhang et al 2020). However, blindly increasing the difficulty of samples may cause the augment ambiguity phenomenon (Wei et al 2020): augmented images may be far away from the majority of clean images, which could cause the under-fitting of model and deteriorate the learning process. Hence, besides optimizing the probability matrix of DA strategies, we randomly sample the magnitude of each chosen strategy from an uniform distribution, which can prevent learning heavy DA strategies: augmenting samples with large magnitude strategies.…”
Section: Data Augmentation Parametersmentioning
confidence: 99%
“…Hence, besides optimizing the probability matrix of DA strategies, we randomly sample the magnitude of each chosen strategy from an uniform distribution, which can prevent learning heavy DA strategies: augmenting samples with large magnitude strategies. Moreover, instead of training a controller to generate adversarial augmentation policies via reinforcement learning (Zhang et al 2020) or training an extra teacher model to generate additional labels for augmented samples (Wei et al 2020), we search for the probability distribution of augmentation transformations directly via gradient-based optimization. In this way, the optimization of data augmentation is very efficient and hardly increases the computing cost.…”
Section: Data Augmentation Parametersmentioning
confidence: 99%
“…CondConv [36] use AutoAugment [7] and mixup [37] as custom data augmentation. [34] reported in a concurrent work that combining AutoAugment and knowledge distillation can have even stronger performance boost, because soft-labels from knowledge distillation helps alleviating label misalignment during aggressive data augmentation. In FBNetV3 [8] the training hyperparameters are treated as components in the search space and are obtained from AutoML-based joint architecture-recipe search.…”
Section: Comparison With Other Efficient Networkmentioning
confidence: 99%