2019
DOI: 10.48550/arxiv.1907.12205
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Abstract: To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but can only tolerate limited number of Byzantine failures. In this work, we present DETOX, a Byzantine-resilient dist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 11 publications
0
9
0
Order By: Relevance
“…This includes countering faulty agents regardless of their behavior, i.e., Byzantine fault-tolerance, or Byzantine resilience [45,102,105], or modeling and countering certain kinds of failures or adversarial behaviors [30,59]. The other line of work intends to counter faulty agents by assigning redundant workloads to agents [18,86].…”
Section: Fault-tolerance In Distributed Optimizationmentioning
confidence: 99%
See 1 more Smart Citation
“…This includes countering faulty agents regardless of their behavior, i.e., Byzantine fault-tolerance, or Byzantine resilience [45,102,105], or modeling and countering certain kinds of failures or adversarial behaviors [30,59]. The other line of work intends to counter faulty agents by assigning redundant workloads to agents [18,86].…”
Section: Fault-tolerance In Distributed Optimizationmentioning
confidence: 99%
“…The assignment of same tasks to multiple agents is also known as algorithmic redundancy. DETOX [86] is an extension of Draco that combines algorithmic redundancy with robust aggregation, with increased speed and improved robustness.…”
Section: Gradient Codingmentioning
confidence: 99%
“…From (19), we observe that the mean of {m k i } over all the honest workers w / ∈ B is an unbiased estimate of f (x k ). Nevertheless, the geometric median of {m k i }, even only over all the honest workers w / ∈ B and calculated accurately, is a biased estimate of f (x k ).…”
Section: A Importance Of Reducing Stochastic Gradient Noisementioning
confidence: 99%
“…Other aggregation rules include Krum [14], that selects a stochastic gradient having the minimal cumulative squared distance from a given number of nearest stochastic gradients, and RSA [15] which aggregates models other than stochastic gradients through penalizing the differences between the local and global model parameters. Related works also include adversarial learning in distributed principal component analysis [16], escaping from saddle points in non-convex distributed learning under Byzantine attacks [17], and leveraging redundant gradients to improve robustness [18], [19].…”
mentioning
confidence: 99%
“…The probabilistic codes suggested in this paper can be designed in a more flexible manner to tolerate Byzantines: we can control the redundancy by choosing an appropriate connection probability p. Simulation results show that our codes having the expected redundancy of E[r] = 2 enjoy significant gain compared to the uncoded scheme when n = 49 and b = 5, while the codes in [17] require the redundancy of r = 11. A recent work [26] suggested a framework DETOX which combines two existing schemes: computing redundant gradients and applying robust gradient aggregation methods. However, DETOX still suffers from a high computational overhead compared to our scheme, since it is based on a robust aggregation scheme, e.g., geometric median aggregator.…”
Section: Introductionmentioning
confidence: 99%