2018
DOI: 10.1109/tit.2017.2736066
|View full text |Cite
|
Sign up to set email alerts
|

Speeding Up Distributed Machine Learning Using Codes

Abstract: Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems there are several types of noise that can affect the performance of distributed machine learning algorithms -straggler nodes, system failures, or communication bottlenecks -but there has been little interaction cutting across codes, machine learning, and distributed systems. In this work, we provide theoretical insights on how coded solutions can achieve significant gains compared to uncoded ones. W… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
424
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 657 publications
(428 citation statements)
references
References 80 publications
4
424
0
Order By: Relevance
“…The shifted exponential model for computation time, which is the sum of a constant (deterministic) term and a variable (stochastic) term, is motivated by the distribution model proposed by authors in [28] for latency in querying data files from cloud storage systems. As demonstrated in [10] as well as by our own experiments, exponential model provides a good fit for the distribution of computation times over cloud computing environments such as Amazon EC2 clusters.…”
Section: B Network Modelsupporting
confidence: 53%
See 4 more Smart Citations
“…The shifted exponential model for computation time, which is the sum of a constant (deterministic) term and a variable (stochastic) term, is motivated by the distribution model proposed by authors in [28] for latency in querying data files from cloud storage systems. As demonstrated in [10] as well as by our own experiments, exponential model provides a good fit for the distribution of computation times over cloud computing environments such as Amazon EC2 clusters.…”
Section: B Network Modelsupporting
confidence: 53%
“…As we state in the following Theorem, HCMM provides an unbounded gain of Θ(log n) over uncoded scheme, in terms of expected running time. This result illustrates that leveraging coded computing, one achieves the same order-wise gain over heterogeneous clusters as over homogeneous clusters [10]. Theorem 2.…”
Section: Resultsmentioning
confidence: 68%
See 3 more Smart Citations