2022
DOI: 10.48550/arxiv.2205.14083
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sharpness-Aware Training for Free

Abstract: Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…And there also have been several simple strategies to achieve a smaller maximum Hessian eigenvalue, such as choosing a large learning rate [12,33,46] and smaller batch size [32,46,62]. Sharpness-Aware Minimization (SAM) [20] and its variants [17,18,39,44,49,54,83,86] are representative training algorithm to seek flat minima for better generalization. However, their definition of flatness is limited to zeroth-order flatness.…”
Section: Flat Minima and Generalizationmentioning
confidence: 99%
See 1 more Smart Citation
“…And there also have been several simple strategies to achieve a smaller maximum Hessian eigenvalue, such as choosing a large learning rate [12,33,46] and smaller batch size [32,46,62]. Sharpness-Aware Minimization (SAM) [20] and its variants [17,18,39,44,49,54,83,86] are representative training algorithm to seek flat minima for better generalization. However, their definition of flatness is limited to zeroth-order flatness.…”
Section: Flat Minima and Generalizationmentioning
confidence: 99%
“…Especially, some works discuss the connection between the geometry of the loss landscape and generalization [20,23,31]. A branch of effective approaches, sharpness-Aware Minimization (SAM) [20] and its variants [17,18,39,49,54,83], minimizes the worst-case loss within a perturbation radius, which we call zeroth-order flatness. It is proven that optimizing the zeroth-order flatness leads to lower generalization error and achieves state-of-the-art performance on various image classification tasks [20,44,86].…”
Section: Introductionmentioning
confidence: 99%
“…This proxy of sharpness lends itself to easy computation, unlike the measures of sharpness described earlier. SAM has sparked interest in sharpness-aware training, resulting in several variants [24,25,26,27,28].…”
Section: Introductionmentioning
confidence: 99%
“…Thus, in SAM, the loss function is modified, in a way that it encourages convergence to flatter regions of the loss. SAM has been shown to be empirically successful in numerous tasks (Bahri et al, 2021;Behdin et al, 2022;Chen et al, 2021) and has been extended to several variations (Du et al, 2022;Zhuang et al, 2022). Thus, there has been a growing interest in understanding the theoretical underpinnings of SAM.…”
Section: Introductionmentioning
confidence: 99%