ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Dangel, Felix; Tatzel, Lukas; Hennig, Philipp

doi:10.48550/arxiv.2106.02624

Search citation statements

Order By: Relevance

Paper Sections

Select...

H1 Generally Related Work1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2022

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The intrinsic low rank structure of the (empirical) Fisher has been exploited in a number of setups by a number of papers including (Agarwal et al, 2019;Goldfarb et al, 2020;Immer et al, 2021;Dangel et al, 2021).…”

Section: H1 Generally Related Workmentioning

confidence: 99%

Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization

Benzing¹

2022

Preprint

View full text Add to dashboard Cite

Second-order optimizers are thought to hold the potential to speed up neural network training, but due to the enormous size of the curvature matrix, they typically require approximations to be computationally tractable. The most successful family of approximations are Kronecker-Factored, block-diagonal curvature estimates (KFAC). Here, we combine tools from prior work to evaluate exact second-order updates with careful ablations to establish a surprising result: Due to its approximations, KFAC is not closely related to second-order updates, and in particular, it significantly outperforms true second-order updates. This challenges widely held believes and immediately raises the question why KFAC performs so well. We answer this question by showing that KFAC approximates a first-order algorithm, which performs gradient descent on neurons rather than weights. Finally, we show that this optimizer often improves over KFAC in terms of computational cost and dataefficiency.

show abstract

Section: H1 Generally Related Workmentioning

confidence: 99%

Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization

Benzing¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Cited by 1 publication

References 7 publications

Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization

Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization

Contact Info

Product

Resources

About