2020
DOI: 10.48550/arxiv.2011.01697
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Abstract: Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. In many scenarios communication between devices is energy demanding and time consuming and forms the bottleneck of the entire system. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators to the communicated messages. By combining our scheme with a new variance reduction technique that progressively throughout the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 33 publications
0
8
0
Order By: Relevance
“…Recalling the definition η = π T A πBη, then the above inequality can be rewritten as Γ1η 2 + Γ2η − Γ3 < 0 with Γ1, Γ2 and Γ3 defined in (17). Hence, it can be derived that η ≤…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Recalling the definition η = π T A πBη, then the above inequality can be rewritten as Γ1η 2 + Γ2η − Γ3 < 0 with Γ1, Γ2 and Γ3 defined in (17). Hence, it can be derived that η ≤…”
Section: Discussionmentioning
confidence: 99%
“…directed networks linear convergence binary quantizer [12], [14] [13], [15] [17]- [21] [22], [23] [2], [24], [25] Our work between the convergence speed and the communication cost per iteration so that linear convergence can be guaranteed by only a few bits quantization. Although the aforementioned quantized algorithms [17]- [19] converge linearly, they are designated only for undirected networks. Note that extending distributed algorithms from undirected networks to directed networks is non-trivial [22]- [25].…”
Section: Referencesmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the analysis relies on the assumption of bounded stochastic gradients. Free of this assumption, gradient difference compression is also provably able to reduce the compression noise, requiring the use of unbiased compressors [22,23,28,29]. Nevertheless, the influence of gradient difference compression on the Byzantine-robustness has not yet been investigated.…”
Section: Related Workmentioning
confidence: 99%
“…Variance reduction techniques have been widely used to reduce stochastic noise to accelerate convergence of stochastic algorithms [32,33,21]. In [23,28], the combination of variance reduction and gradient difference compression is investigated. Variance reduction is also important to Byzantine-robustness.…”
Section: Related Workmentioning
confidence: 99%