Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Du, Jiawei; Yan, Hanshu; Feng, Jiashi; Zhou, Joey Tianyi; Zhen, Liangli; Goh, Rick Siow Mong; Tan, Vincent Y. F.

doi:10.48550/arxiv.2110.03141

Cited by 12 publications

(25 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, sharpness-aware minimization (SAM) [16] seeks to find parameters that lie in a region with both low loss value and loss sharpness and shows promising performance across various architectures and benchmark datasets. Moreover, several methods have been proposed to improve the performance [31] or efficiency [13] of SAM. Specifically, ASAM [31] introduces a concept of adaptive sharpness to mitigate the effect of parameter re-scaling while ESAM [13] reduces the computational overhead without performance drop.…”

Section: Related Workmentioning

confidence: 99%

“…Moreover, several methods have been proposed to improve the performance [31] or efficiency [13] of SAM. Specifically, ASAM [31] introduces a concept of adaptive sharpness to mitigate the effect of parameter re-scaling while ESAM [13] reduces the computational overhead without performance drop. Compared with these existing methods, our proposed SAQ focuses on improving the generalization performance of the quantized models.…”

Section: Related Workmentioning

confidence: 99%

“…For the fixed-precision training, we apply m-sharpness strategy with m = 128 following ESAM [13] and ASAM [31]. Following [33], we introduce weight normalization during training.…”

Section: Appendixmentioning

confidence: 99%

See 2 more Smart Citations

Sharpness-aware Quantization for Deep Neural Networks

Liu¹,

Cai²,

Zhuang

2021

Preprint

View full text Add to dashboard Cite

Network quantization is an effective compression method to reduce the model size and computational cost. Despite the high compression ratio, training a low-precision model is difficult due to the discrete and non-differentiable nature of quantization, resulting in considerable performance degradation. Recently, Sharpness-Aware Minimization (SAM) is proposed to improve the generalization performance of the models by simultaneously minimizing the loss value and the loss curvature. In this paper, we devise a Sharpness-Aware Quantization (SAQ) method to train quantized models, leading to better generalization performance. Moreover, since each layer contributes differently to the loss value and the loss sharpness of a network, we further devise an effective method that learns a configuration generator to automatically determine the bitwidth configurations of each layer, encouraging lower bits for flat regions and vice versa for sharp landscapes, while simultaneously promoting the flatness of minima to enable more aggressive quantization. Extensive experiments on CIFAR-100 and ImageNet show the superior performance of the proposed methods. For example, our quantized ResNet-18 with 55.1× Bit-Operation (BOP) reduction even outperforms the full-precision one by 0.7% in terms of the Top-1 accuracy. Code is available at https://github.com/zhuanggroup/SAQ.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Sharpness-aware Quantization for Deep Neural Networks

Liu¹,

Cai²,

Zhuang

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…SAM explicitly penalizes a sharpness measure to obtain flat minima, which has achieved state-of-the-art results in several learning tasks [2,37]. ESAM [6], GSAM [37], and SAF. GSAM is the stateof-the-art among SAM's follow-up works.…”

Section: Introductionmentioning

confidence: 99%

“…Liu et al [21] and Du et al [6] recently addressed the computation issue of SAM and proposed LookSAM [21] and Efficient SAM (ESAM) [6], respectively. LookSAM only minimizes the sharpness measure once in the first of every five iterations.…”

Section: Introductionmentioning

confidence: 99%

Sharpness-Aware Training for Free

Juan¹,

Zhou²,

Feng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e.g. SGD) for approximating the sharpness measure. In this paper, we propose Sharpness-Aware Training for Free, or SAF, which mitigates the sharp landscape at almost zero additional computational cost over the base optimizer. Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights. Specifically, we suggest a novel trajectory loss, based on the KL-divergence between the outputs of DNNs with the current weights and past weights, as a replacement of the SAM's sharpness measure. This loss captures the rate of change of the training loss along the model's update trajectory. By minimizing it, SAF ensures the convergence to a flat minimum with improved generalization capabilities. Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the ImageNet dataset with essentially the same computational cost as the base optimizer. The code will be released.Preprint. Under review.

show abstract

Linguistic Semiotics

Wang¹

2020

Peking University Linguistics Research

View full text Add to dashboard Cite

Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance. However, their performance on imbalanced dataset is relatively poor, where the distribution of classes in the training dataset is skewed, leading to poor performance in predicting minority classes. For instance, CLIP achieved only 5% accuracy on the iNaturalist18 dataset. We propose to add a lightweight decoder to VLMs to avoid OOM (out of memory) problem caused by large number of classes and capture nuanced features for tail classes. Then, we explore improvements of VLMs using prompt tuning, fine-tuning, and incorporating imbalanced algorithms such as Focal Loss, Balanced SoftMax and Distribution Alignment. Experiments demonstrate that the performance of VLMs can be further boosted when used with decoder and imbalanced methods. Specifically, our improved VLMs significantly outperforms zero-shot classification by an average accuracy of 6.58%, 69.82%, and 6.17%, on ImageNet-LT, iNaturalist18, and Places-LT, respectively. We further analyze the influence of pre-training data size, backbones, and training cost. Our study highlights the significance of imbalanced learning algorithms in face of VLMs pre-trained by huge data. We release our code at https://github.com/Imbalance-VLM/Imbalance-VLM.

show abstract

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Cited by 12 publications

References 13 publications

Sharpness-aware Quantization for Deep Neural Networks

Sharpness-aware Quantization for Deep Neural Networks

Sharpness-Aware Training for Free

Linguistic Semiotics

Contact Info

Product

Resources

About