2019
DOI: 10.48550/arxiv.1901.09671
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

Abstract: We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ErasureHead instead uses approximate gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance. Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
29
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(30 citation statements)
references
References 25 publications
1
29
0
Order By: Relevance
“…This work has been extended in several directions. In [16], the authors developed algorithms to leverage partial computations at the stragglers; the communication and computation properties of GC were studied in [17], [18] and [19]; while GC was extended to distributed SGD in [20] and [21].…”
Section: Psmentioning
confidence: 99%
“…This work has been extended in several directions. In [16], the authors developed algorithms to leverage partial computations at the stragglers; the communication and computation properties of GC were studied in [17], [18] and [19]; while GC was extended to distributed SGD in [20] and [21].…”
Section: Psmentioning
confidence: 99%
“…Coded computing methods with code rate r (a quantity between 0 and 1) make it possible to either recover the gradient exactly (e.g., [4]) or an approximation thereof (e.g., [5], [6], [21], [22]) from intermediate results computed by a subset of the workers, at the expense of increasing the computational load of each worker by a factor 1/r relative to GD. The gradient is recovered via a decoding operation (that typically reduces to solving a system of linear equations), the complexity of which usually increases superlinearly with the number of workers.…”
Section: Coded Computingmentioning
confidence: 99%
“…Finally, for both PCA and logistic regression, the straggler resiliency afforded by coding is canceled out by the higher computational load. Here, we consider a code rate r = 45/49, which we find yields lower latency compared to the lower rates typically used in coded computing (e.g., in [4], [5], [6], [21], [22]).…”
Section: Artificial Scenariomentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, we focus on distributed machine learning setup, where the aim is to implement the iterative gradient descent algorithm. Coding techniques used in this setup are termed as gradient coding [2]- [9]. In gradient coding, the key idea is to create data partitions with coded redundancy such that they are robust to stragglers.…”
Section: Introductionmentioning
confidence: 99%