A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Kovalev, D. Yu.; Koloskova, Anastasia; Jäggi, Martin; Richtárik, Peter; Stich, Sebastian U.

doi:10.48550/arxiv.2011.01697

Cited by 4 publications

(8 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recalling the definition η = π T A πBη, then the above inequality can be rewritten as Γ1η 2 + Γ2η − Γ3 < 0 with Γ1, Γ2 and Γ3 defined in (17). Hence, it can be derived that η ≤…”

Section: Discussionmentioning

confidence: 99%

“…directed networks linear convergence binary quantizer [12], [14] [13], [15] [17]- [21] [22], [23] [2], [24], [25] Our work between the convergence speed and the communication cost per iteration so that linear convergence can be guaranteed by only a few bits quantization. Although the aforementioned quantized algorithms [17]- [19] converge linearly, they are designated only for undirected networks. Note that extending distributed algorithms from undirected networks to directed networks is non-trivial [22]- [25].…”

Section: Referencesmentioning

confidence: 99%

“…Let h(k) = Cξ k , where C is a positive constant, and ξ ∈ (ρ, 1). The step size η is chosen according to (17). Then the quantizers will never saturate provided that the quantization levels satisfy the following update rule:…”

Section: Convergence Analysismentioning

confidence: 99%

See 2 more Smart Citations

Quantized Distributed Gradient Tracking Algorithm with Linear Convergence in Directed Networks

Xiong¹,

Wu²,

You³

et al. 2021

Preprint

View full text Add to dashboard Cite

Communication efficiency is a major bottleneck in the applications of distributed networks. To address the problem, the problem of quantized distributed optimization has attracted a lot of attention. However, most of the existing quantized distributed optimization algorithms can only converge sublinearly. To achieve linear convergence, this paper proposes a novel quantized distributed gradient tracking algorithm (Q-DGT) to minimize a finite sum of local objective functions over directed networks. Moreover, we explicitly derive the update rule for the number of quantization levels, and prove that Q-DGT can converge linearly even when the exchanged variables are respectively one bit. Numerical results also confirm the efficiency of the proposed algorithm.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Referencesmentioning

confidence: 99%

See 1 more Smart Citation

Quantized Distributed Gradient Tracking Algorithm with Linear Convergence in Directed Networks

Xiong¹,

Wu²,

You³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…However, the analysis relies on the assumption of bounded stochastic gradients. Free of this assumption, gradient difference compression is also provably able to reduce the compression noise, requiring the use of unbiased compressors [22,23,28,29]. Nevertheless, the influence of gradient difference compression on the Byzantine-robustness has not yet been investigated.…”

Section: Related Workmentioning

confidence: 99%

“…Variance reduction techniques have been widely used to reduce stochastic noise to accelerate convergence of stochastic algorithms [32,33,21]. In [23,28], the combination of variance reduction and gradient difference compression is investigated. Variance reduction is also important to Byzantine-robustness.…”

Section: Related Workmentioning

confidence: 99%

BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning

Zhu¹,

Ling²

2021

Preprint

View full text Add to dashboard Cite

Communication between workers and the master node to collect local stochastic gradients is a key bottleneck in a large-scale federated learning system. Various recent works have proposed to compress the local stochastic gradients to mitigate the communication overhead. However, robustness to malicious attacks is rarely considered in such a setting. In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious. We point out that a vanilla combination of compressed stochastic gradient descent (SGD) and geometric median-based robust aggregation suffers from both stochastic and compression noise in the presence of Byzantine attacks. In light of this observation, we propose to jointly reduce the stochastic and compression noise so as to improve the Byzantine-robustness. For the stochastic noise, we adopt the stochastic average gradient algorithm (SAGA) to gradually eliminate the inner variations of regular workers. For the compression noise, we apply the gradient difference compression and achieve compression for free. We theoretically prove that the proposed algorithm reaches a neighborhood of the optimal solution at a linear convergence rate, and the asymptotic learning error is in the same order as that of the state-of-the-art uncompressed method. Finally, numerical experiments demonstrate effectiveness of the proposed method. The code is available https://github.com/oyhah/BROADCAST.

show abstract

Communication-efficient Distributed Cooperative Learning with Compressed Beliefs

Toghani¹,

Uribe²

2021

Preprint

View full text Add to dashboard Cite

We study the problem of distributed cooperative learning, where a group of agents seek to agree on a set of hypotheses that best describes a sequence of private observations. In the scenario where the set of hypotheses is large, we propose a belief update rule where agents share compressed (either sparse or quantized) beliefs with an arbitrary positive compression rate. Our algorithm leverages a unified and straightforward communication rule that enables agents to access wide-ranging compression operators as black-box modules. We prove the almost sure asymptotic exponential convergence of beliefs around the set of optimal hypotheses. Additionally, we show a nonasymptotic, explicit, and linear concentration rate in probability of the beliefs on the optimal hypothesis set. We provide numerical experiments to illustrate the communication benefits of our method. The simulation results show that the number of transmitted bits can be reduced to 5−10% of the non-compressed method in the studied scenarios.

show abstract

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Cited by 4 publications

References 33 publications

Quantized Distributed Gradient Tracking Algorithm with Linear Convergence in Directed Networks

Quantized Distributed Gradient Tracking Algorithm with Linear Convergence in Directed Networks

BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning

Communication-efficient Distributed Cooperative Learning with Compressed Beliefs

Contact Info

Product

Resources

About