2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) 2020
DOI: 10.1109/spawc48557.2020.9153887
|View full text |Cite
|
Sign up to set email alerts
|

Ordered Gradient Approach for Communication-Efficient Distributed Learning

Abstract: Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets. To mitigate these issues, we propose a new com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…The second class of approaches focuses on the reduction of the communication iterations by eliminating the communication between some of the workers and the master node in some iterations [16]. The work [16] has proposed lazily aggregated gradient (LAG) for communication-efficient distributed learning in master-worker architectures. In LAG, each worker reports its gradient vector to the master node only if the gradient changes from the last communication iteration are large enough.…”
Section: A Literature Surveymentioning
confidence: 99%
“…The second class of approaches focuses on the reduction of the communication iterations by eliminating the communication between some of the workers and the master node in some iterations [16]. The work [16] has proposed lazily aggregated gradient (LAG) for communication-efficient distributed learning in master-worker architectures. In LAG, each worker reports its gradient vector to the master node only if the gradient changes from the last communication iteration are large enough.…”
Section: A Literature Surveymentioning
confidence: 99%
“…Censorship in distributed learning reduces communication, but some useful information may be lost. [127] studies an ordered gradient method that uses sorting to eliminate some of the worker-to-server upstream communication typically required in gradient descent methods. [128] and [129] study gradient coding to reduce communication costs while being able to reduce the latency caused by slow-running machines.…”
Section: A Communication Costmentioning
confidence: 99%
“…To the best of our knowledge, ordered transmissions have not been applied to federated learning in a completely distributed setting. Some extensions to the work in [21] have been developed, including the application of ordering to quickest change detection in sensor networks [22], nearest-neighbor learning [23], and ordered gradient descent (GD) in a worker-server architecture setting [24].…”
Section: Introductionmentioning
confidence: 99%