Circumventing Outliers of AutoAugment with Knowledge Distillation

Xiao, An; Xie, Lingxi; Zhang, Xiaopeng; Chen, Xin

doi:10.1007/978-3-030-58580-8_36

Cited by 41 publications

(27 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Vision transformers have also been applied in medical image classification Dai et al (2021) . Furthermore, knowledge distillation Hinton et al (2015) ; Wei et al (2020) from the model based on CNN is shown to be effective in improving the performance of the vision transformer Touvron et al (2020) . Instead of simply regarding image patches as tokens, Yuan et al proposed a tokens-to-token (T2T) method to better tokenize patches with the consideration of image structure Yuan et al (2021) .…”

Section: Related Workmentioning

confidence: 99%

Classification of Diabetic Foot Ulcers Using Class Knowledge Banks

Han

Zhou

et al. 2022

Front. Bioeng. Biotechnol.

View full text Add to dashboard Cite

Diabetic foot ulcers (DFUs) are one of the most common complications of diabetes. Identifying the presence of infection and ischemia in DFU is important for ulcer examination and treatment planning. Recently, the computerized classification of infection and ischaemia of DFU based on deep learning methods has shown promising performance. Most state-of-the-art DFU image classification methods employ deep neural networks, especially convolutional neural networks, to extract discriminative features, and predict class probabilities from the extracted features by fully connected neural networks. In the testing, the prediction depends on an individual input image and trained parameters, where knowledge in the training data is not explicitly utilized. To better utilize the knowledge in the training data, we propose class knowledge banks (CKBs) consisting of trainable units that can effectively extract and represent class knowledge. Each unit in a CKB is used to compute similarity with a representation extracted from an input image. The averaged similarity between units in the CKB and the representation can be regarded as the logit of the considered input. In this way, the prediction depends not only on input images and trained parameters in networks but the class knowledge extracted from the training data and stored in the CKBs. Experimental results show that the proposed method can effectively improve the performance of DFU infection and ischaemia classifications.

show abstract

Section: Related Workmentioning

confidence: 99%

Classification of Diabetic Foot Ulcers Using Class Knowledge Banks

Han

Zhou

et al. 2022

Front. Bioeng. Biotechnol.

View full text Add to dashboard Cite

show abstract

“…By sampling such transformations, DHA can pay more attention to more aggressive DA strategies and increase model robustness against difficult samples (Zhang et al 2020). However, blindly increasing the difficulty of samples may cause the augment ambiguity phenomenon (Wei et al 2020): augmented images may be far away from the majority of clean images, which could cause the under-fitting of model and deteriorate the learning process. Hence, besides optimizing the probability matrix of DA strategies, we randomly sample the magnitude of each chosen strategy from an uniform distribution, which can prevent learning heavy DA strategies: augmenting samples with large magnitude strategies.…”

Section: Data Augmentation Parametersmentioning

confidence: 99%

“…Hence, besides optimizing the probability matrix of DA strategies, we randomly sample the magnitude of each chosen strategy from an uniform distribution, which can prevent learning heavy DA strategies: augmenting samples with large magnitude strategies. Moreover, instead of training a controller to generate adversarial augmentation policies via reinforcement learning (Zhang et al 2020) or training an extra teacher model to generate additional labels for augmented samples (Wei et al 2020), we search for the probability distribution of augmentation transformations directly via gradient-based optimization. In this way, the optimization of data augmentation is very efficient and hardly increases the computing cost.…”

Section: Data Augmentation Parametersmentioning

confidence: 99%

DHA: End-to-End Joint Optimization of Data Augmentation Policy, Hyper-parameter and Architecture

Zhou¹,

Hong²,

Hu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Automated machine learning (AutoML) usually involves several crucial components, such as Data Augmentation (DA) policy, Hyper-Parameter Optimization (HPO), and Neural Architecture Search (NAS). Although many strategies have been developed for automating these components in separation, joint optimization of these components remains challenging due to the largely increased search dimension and the variant input types of each component. Meanwhile, conducting these components in a sequence often requires careful coordination by human experts and may lead to sub-optimal results. In parallel to this, the common practice of searching for the optimal architecture first and then retraining it before deployment in NAS often suffers from low performance correlation between the search and retraining stages. An endto-end solution that integrates the AutoML components and returns a ready-to-use model at the end of the search is desirable. In view of these, we propose DHA, which achieves joint optimization of Data augmentation policy, Hyper-parameter and Architecture. Specifically, end-to-end NAS is achieved in a differentiable manner by optimizing a compressed lowerdimensional feature space, while DA policy and HPO are updated dynamically at the same time. Experiments show that DHA achieves state-of-the-art (SOTA) results on various datasets, especially 77.4% accuracy on ImageNet with cell based search space, which is higher than current SOTA by 0.5%. To the best of our knowledge, we are the first to efficiently and jointly optimize DA policy, NAS, and HPO in an end-to-end manner without retraining.

show abstract

“…CondConv [36] use AutoAugment [7] and mixup [37] as custom data augmentation. [34] reported in a concurrent work that combining AutoAugment and knowledge distillation can have even stronger performance boost, because soft-labels from knowledge distillation helps alleviating label misalignment during aggressive data augmentation. In FBNetV3 [8] the training hyperparameters are treated as components in the search space and are obtained from AutoML-based joint architecture-recipe search.…”

Section: Comparison With Other Efficient Networkmentioning

confidence: 99%

BasisNet: Two-stage Model Synthesis for Efficient Inference

Zhang¹,

Chu²,

Zhmoginov³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this work, we present BasisNet which combines recent advancements in efficient neural network architectures, conditional computation, and early termination in a simple new form. Our approach incorporates a lightweight model to preview the input and generate input-dependent combination coefficients, which later controls the synthesis of a more accurate specialist model to make final prediction. The two-stage model synthesis strategy can be applied to any network architectures and both stages are jointly trained. We also show that proper training recipes are critical for increasing generalizability for such high capacity neural networks. On ImageNet classification benchmark, our BasisNet with MobileNets as backbone demonstrated clear advantage on accuracy-efficiency trade-off over several strong baselines. Specifically, BasisNet-MobileNetV3 obtained 80.3% top-1 accuracy with only 290M Multiply-Add operations, halving the computational cost of previous state-of-the-art without sacrificing accuracy. With early termination, the average cost can be further reduced to 198M MAdds while maintaining accuracy of 80.0% on ImageNet.

show abstract

Circumventing Outliers of AutoAugment with Knowledge Distillation

Cited by 41 publications

References 32 publications

Classification of Diabetic Foot Ulcers Using Class Knowledge Banks

Classification of Diabetic Foot Ulcers Using Class Knowledge Banks

DHA: End-to-End Joint Optimization of Data Augmentation Policy, Hyper-parameter and Architecture

BasisNet: Two-stage Model Synthesis for Efficient Inference

Contact Info

Product

Resources

About