Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2020
DOI: 10.1145/3394486.3403265
|View full text |Cite
|
Sign up to set email alerts
|

Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(16 citation statements)
references
References 8 publications
0
16
0
Order By: Relevance
“…Compared to S-SGD, D-KFAC requires six extra timeconsuming operations at each layer: four computing operations (compute Kronecker factors A p l−1 and G p l and their inverses) and two communication operations (aggregation of A p l−1 and G p l ). Due to the high computational cost of inverting matrices, recent work has proposed the distributed algorithm to reduce the computation time of inverting matrices [13,20,22]. As shown in Eq.…”
Section: B Distributed Kfac (D-kfac)mentioning
confidence: 99%
See 4 more Smart Citations
“…Compared to S-SGD, D-KFAC requires six extra timeconsuming operations at each layer: four computing operations (compute Kronecker factors A p l−1 and G p l and their inverses) and two communication operations (aggregation of A p l−1 and G p l ). Due to the high computational cost of inverting matrices, recent work has proposed the distributed algorithm to reduce the computation time of inverting matrices [13,20,22]. As shown in Eq.…”
Section: B Distributed Kfac (D-kfac)mentioning
confidence: 99%
“…( 13), the inverse operations of Kronecker factors at different layers have no dependency with each other. In existing state-of-the-art solutions [13,20,22], the workloads of different layers in computing inverses are distributed to multiple GPUs (with a concept of model parallelism), and their results are finally gathered to all GPUs for preconditioning gradients. An example is shown in the right hand side of Fig.…”
Section: B Distributed Kfac (D-kfac)mentioning
confidence: 99%
See 3 more Smart Citations