2019
DOI: 10.48550/arxiv.1902.00179
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Compressing Gradient Optimizers via Count-Sketches

Ryan Spring,
Anastasios Kyrillidis,
Vijai Mohan
et al.

Abstract: Many popular first-order optimization methods (e.g., Momentum, AdaGrad, Adam) accelerate the convergence rate of deep learning models. However, these algorithms require auxiliary parameters, which cost additional memory proportional to the number of parameters in the model. The problem is becoming more severe as deep learning models continue to grow larger in order to learn from complex, large-scale datasets. Our proposed solution is to maintain a linear sketch to compress the auxiliary variables. We demonstra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…FetchSGD overall aids with communication constraints of an FL environment by compressing the gradient that is based on the client's local data. The data structure Count Sketch [74] is used to compress the gradient before it is uploaded to the central server. The Count Sketch is also used for error accumulation.…”
Section: A Decentralized Deep Learning Modelmentioning
confidence: 99%
“…FetchSGD overall aids with communication constraints of an FL environment by compressing the gradient that is based on the client's local data. The data structure Count Sketch [74] is used to compress the gradient before it is uploaded to the central server. The Count Sketch is also used for error accumulation.…”
Section: A Decentralized Deep Learning Modelmentioning
confidence: 99%