DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Rajput, Shashank; Wang, Hongyi; Charles, Zachary; Papailiopoulos, Dimitris S.

doi:10.48550/arxiv.1907.12205

Cited by 6 publications

(9 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This includes countering faulty agents regardless of their behavior, i.e., Byzantine fault-tolerance, or Byzantine resilience [45,102,105], or modeling and countering certain kinds of failures or adversarial behaviors [30,59]. The other line of work intends to counter faulty agents by assigning redundant workloads to agents [18,86].…”

Section: Fault-tolerance In Distributed Optimizationmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Fault-tolerance in Distributed Optimization and Machine Learning

Liu

2021

Preprint

View full text Add to dashboard Cite

The robustness of distributed optimization is an emerging field of study, motivated by various applications of distributed optimization including distributed machine learning, distributed sensing, and swarm robotics. With the rapid expansion of the scale of distributed systems, resilient distributed algorithms for optimization are needed, in order to mitigate system failures, communication issues, or even malicious attacks. This survey investigates the current state of fault-tolerance research in distributed optimization, and aims to provide an overview of the existing studies on both fault-tolerant distributed optimization theories and applicable algorithms.

show abstract

Section: Fault-tolerance In Distributed Optimizationmentioning

confidence: 99%

“…The assignment of same tasks to multiple agents is also known as algorithmic redundancy. DETOX [86] is an extension of Draco that combines algorithmic redundancy with robust aggregation, with increased speed and improved robustness.…”

Section: Gradient Codingmentioning

confidence: 99%

A Survey on Fault-tolerance in Distributed Optimization and Machine Learning

Liu

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…From (19), we observe that the mean of {m k i } over all the honest workers w / ∈ B is an unbiased estimate of f (x k ). Nevertheless, the geometric median of {m k i }, even only over all the honest workers w / ∈ B and calculated accurately, is a biased estimate of f (x k ).…”

Section: A Importance Of Reducing Stochastic Gradient Noisementioning

confidence: 99%

“…Other aggregation rules include Krum [14], that selects a stochastic gradient having the minimal cumulative squared distance from a given number of nearest stochastic gradients, and RSA [15] which aggregates models other than stochastic gradients through penalizing the differences between the local and global model parameters. Related works also include adversarial learning in distributed principal component analysis [16], escaping from saddle points in non-convex distributed learning under Byzantine attacks [17], and leveraging redundant gradients to improve robustness [18], [19].…”

mentioning

confidence: 99%

Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks

Ling²,

Chen

et al. 2019

Preprint

View full text Add to dashboard Cite

This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise makes it challenging to distinguish malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the 'honest' workers. This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks. To this end, the present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks. Rather than the mean employed by distributed SAGA, the novel Byrd-SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, the robustness of geometric median to outliers enables Byrd-SAGA to attain provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd-SAGA over Byzantine attack resilient distributed SGD.

show abstract

“…The probabilistic codes suggested in this paper can be designed in a more flexible manner to tolerate Byzantines: we can control the redundancy by choosing an appropriate connection probability p. Simulation results show that our codes having the expected redundancy of E[r] = 2 enjoy significant gain compared to the uncoded scheme when n = 49 and b = 5, while the codes in [17] require the redundancy of r = 11. A recent work [26] suggested a framework DETOX which combines two existing schemes: computing redundant gradients and applying robust gradient aggregation methods. However, DETOX still suffers from a high computational overhead compared to our scheme, since it is based on a robust aggregation scheme, e.g., geometric median aggregator.…”

Section: Introductionmentioning

confidence: 99%

Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks

Sohn,

Han,

Choi

et al. 2019

Preprint

View full text Add to dashboard Cite

Recent advances in large-scale distributed learning algorithms have enabled communication-efficient training via SIGNSGD. Unfortunately, a major issue continues to plague distributed learning: namely, Byzantine failures may incur serious degradation in learning accuracy. This paper proposes ELECTION CODING, a coding-theoretic framework to guarantee Byzantine-robustness for SIGNSGD WITH MAJORITY VOTE, which uses minimum worker-master communication in both directions. The suggested framework explores new information-theoretic limits of finding the majority opinion when some workers could be malicious, and paves the road to implement robust and efficient distributed learning algorithms. Under this framework, we construct two types of explicit codes, random Bernoulli codes and deterministic algebraic codes, that can tolerate Byzantine attacks with a controlled amount of computational redundancy. For the Bernoulli codes, we provide upper bounds on the error probability in estimating the majority opinion, which give useful insights into code design for tolerating Byzantine attacks. As for deterministic codes, we construct an explicit code which perfectly tolerates Byzantines, and provide tight upper/lower bounds on the minimum required computational redundancy. Finally, the Byzantine-tolerance of the suggested coding schemes is confirmed by deep learning experiments on Amazon EC2 using Python with MPI4py package.

show abstract

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Cited by 6 publications

References 11 publications

A Survey on Fault-tolerance in Distributed Optimization and Machine Learning

A Survey on Fault-tolerance in Distributed Optimization and Machine Learning

Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks

Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks

Contact Info

Product

Resources

About