2019
DOI: 10.48550/arxiv.1907.10732
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 22 publications
1
3
0
Order By: Relevance
“…The diagonal elements have an interesting empirical behavior (Section 5): a small number of diagonal elements have relatively high values and most have quite small values. Such behavior aligns well with recent results on the eigen-spectrum of the Hessian [Li et al, 2019, Sagun et al, 2016. Further, all the diagonal elements decrease as more samples are used for training.…”
Section: Main Result: Deterministic Smooth Predictorssupporting
confidence: 91%
See 3 more Smart Citations
“…The diagonal elements have an interesting empirical behavior (Section 5): a small number of diagonal elements have relatively high values and most have quite small values. Such behavior aligns well with recent results on the eigen-spectrum of the Hessian [Li et al, 2019, Sagun et al, 2016. Further, all the diagonal elements decrease as more samples are used for training.…”
Section: Main Result: Deterministic Smooth Predictorssupporting
confidence: 91%
“…One concern in using the Hessian H θ † l,φ is its scale dependence [Dinh et al, 2017], but we prove that our bound is scale invariant (Appendix B). While H θ † l,φ could have been constructed from the Hessian eigen-values [Li et al, 2019, Sagun et al, 2016, the resulting bound would have been scale dependent and hence undesirable. Further, the diagonals of the Hessian are much easier to numerically compute compared to the eigen-values.…”
Section: Main Result: Deterministic Smooth Predictorsmentioning
confidence: 99%
See 2 more Smart Citations