2018
DOI: 10.48550/arxiv.1808.07576
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

Jianyu Wang,
Gauri Joshi

Abstract: Communication-efficient SGD algorithms, which allow nodes to perform local updates and periodically synchronize local models, are highly effective in improving the speed and scalability of distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a unified framework called Cooperative SGD that subsumes existing communication-efficient SGD algorithms such as periodic-averaging, elasticaverag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
194
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 103 publications
(200 citation statements)
references
References 28 publications
6
194
0
Order By: Relevance
“…In this section, we provide some auxiliary results for the proof of Theorem 1. We first give an alternative form of the reconstruction error derived from the condition (7) and the performance guarantee (6). Lemma 3.…”
Section: A Auxiliary Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we provide some auxiliary results for the proof of Theorem 1. We first give an alternative form of the reconstruction error derived from the condition (7) and the performance guarantee (6). Lemma 3.…”
Section: A Auxiliary Resultsmentioning
confidence: 99%
“…The first category aims to reduce the number of communication rounds, based on the idea that each edge device runs multiple local SGD steps in parallel before sending the local updates to the server for aggregation. This approach has also been called FedAvg [1] in federated learning and convergence has been studied in [5,6,7]. Another line of work investigates lazy/adaptive upload of information, i.e., local gradients are uploaded only when found to be informative enough [8].…”
Section: Introductionmentioning
confidence: 99%
“…FedAvg is able to reduce communication costs by training clients for multiple rounds locally. Several works have shown the convergence of FedAvg under several different settings with both homogeneous (IID) data [37,41] and heterogeneous (non-IID) data [23,3,44] even with partial clients participation. Specifically, [44] demonstrated LocalSGD achieves O( 1 √ N Q ) convergence for non-convex optimization and [23] established a convergence rate of O( 1Q ) for strongly convex problems on FedAvg, where Q is the number of SGDs and N is the number of participated clients.…”
Section: Related Workmentioning
confidence: 99%
“…However, the success of these algorithms has only been demonstrated empirically (e.g., [6,13]). Unlike standard FL that has received rigorous theoretical analysis [37,3,44,23], the convergence of heterogeneous FL with adaptive online model pruning is still an open question. Little is known about whether such algorithms converge to a solution of standard FL.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, the iteration complexity and convergence of FedAve are carefully analyzed in [20]. More generally, a unified analysis of the class of communication-efficient SGD algorithms is presented in [23]. Various other federated optimization methods have also been proposed that address different drawbacks of FedAve.…”
Section: ) Communication Costmentioning
confidence: 99%