2022
DOI: 10.4208/jml.220404
|View full text |Cite
|
Sign up to set email alerts
|

Beyond the Quadratic Approximation: The Multiscale Structure of Neural Network Loss Landscapes

Abstract: A quadratic approximation of neural network loss landscapes has been extensively used to study the optimization process of these networks. Though, it usually holds in a very small neighborhood of the minimum, it cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of a good quadratic approximation. Numerically, we observe that neural network loss functions po… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 11 publications
0
7
0
Order By: Relevance
“…• We propose an Implicit Regularization Enhancement (IRE) framework to speed up the convergence towards flatter minima. As suggested by works like Blanc et al (2020), and Ma et al (2022), the implicit sharpness reduction often occurs at a very slow pace, along flat directions. Inspired by this picture, IRE particularly accelerates the dynamics along flat directions, while keeping sharp directions' dynamics unchanged.…”
Section: Introductionmentioning
confidence: 77%
See 2 more Smart Citations
“…• We propose an Implicit Regularization Enhancement (IRE) framework to speed up the convergence towards flatter minima. As suggested by works like Blanc et al (2020), and Ma et al (2022), the implicit sharpness reduction often occurs at a very slow pace, along flat directions. Inspired by this picture, IRE particularly accelerates the dynamics along flat directions, while keeping sharp directions' dynamics unchanged.…”
Section: Introductionmentioning
confidence: 77%
“…Wu et al (2018; and Ma and Ying (2021) provided an explanation of implicit sharpness regularization from a dynamical stability perspective. Moreover, in-depth analysis of SGD dynamics near global minima shows that the SGD noise (Blanc et al, 2020;Ma et al, 2022;Damian et al, 2021) and the edge of stability (EoS)-driven (Wu et al, 2018;Cohen et al, 2021) oscillations (Even et al, 2024) can drive SGD/GD towards flatter minima. Additional studies explored how training components, including learning rate and batch size (Jastrzębski et al, 2017), normalization (Lyu et al, 2022), cyclic LR (Wang and Wu, 2023), influence this sharpness regularization.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another similar idea that focus on the local property of loss landscape is also contributive. Another work (Ma et al 2022) extends the existing literature on the optimization of neural network loss functions by addressing the limitations of the quadratic approximation and emphasizing the importance of the multiscale structure. Their work contributes to the field by empirically demonstrating the subquadratic growth and separate scales structure, offering explanations for intriguing training phenomena.…”
Section: Related Workmentioning
confidence: 99%
“…We note that analysis only applies to stochastic gradient descent. In case of full gradient descent there have been several recent works showing that quadratic approximation model might be toosimplistic (Ma et al, 2022;Damian et al, 2022;Cohen et al, 2021).…”
Section: Related Workmentioning
confidence: 99%