Hierarchical Coding for Distributed Computing

Park, Hyegyeong; Lee, Kangwook; Sohn, Jy-yong; Suh, Changho; Moon, Jaekyun

doi:10.48550/arxiv.1801.04686

Cited by 4 publications

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, [17] proposed the use of erasure codes for speeding up the computation of linear functions in distributed learning systems. Since then, many other works have analyzed the use of coding theory for distributed tasks with linear structure [11,22,18,10,32]. [29] proposed the use of coding theory for nonlinear machine learning tasks.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, coding theory has provided a popular tool set for mitigating the effects of stragglers. Codes have recently been used in the context of machine learning, and applied to problems such as data shuffling [19,17], distributed matrix-vector and matrix-matrix multiplication [17,10], as well as distributed training [20,32,15,11,22,7,33]. In the context of distributed, gradient-based algorithms, Tandon et al [29] introduced gradient coding as a means to mitigate straggler delays.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

Wang¹,

Charles²,

Papailiopoulos³

2019

Preprint

View full text Add to dashboard Cite

We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ErasureHead instead uses approximate gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance. Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and convergence guarantees. We establish that down to a small noise floor, ErasureHead converges as quickly as distributed GD and has faster overall runtime under a probabilistic delay model. We conduct extensive experiments on real world datasets and distributed clusters and demonstrate that our method can lead to significant speedups over both standard and gradient coded GD.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

Wang¹,

Charles²,

Papailiopoulos³

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…However, the decoding process requires the computational complexity of O(k 3 ). Moreover, the coding schemes suggested in [4], [6], [14] encode the tasks along multiple dimensions, which can effectively reduce the decoding complexities by the virtue of parallel decoding or a peeling decoding scheme. However, these codes lose the MDS property and thereby cannot achieve the optimal computing time.…”

Section: B Related Workmentioning

confidence: 99%

“…Afterwards, it is shown that coded computation can effectively improve the performance of computing system with regards to: matrix-matrix multiplication [4]- [6], distributed gradient descent [7], [8], convolution [9], Fourier transform [10], and matrix sparsification [11], [12]. Moreover, regarding the matrix multiplication, new models reflecting the practical environment of computing systems such as the tree structure and heterogeneity are suggested and analyzed [13], [14]. In recent years, distributed cloud computing services such as Amazon EC2 enable customers to deal with large-scale computation [15].…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, in the real world, the workers' latency statistics are heterogeneous due to a mixed use of hardwares with varying performances or the dynamics of multiple user requests over shared resources [19]. So far, the homogeneous grouped structure has been considered in [14], and the heterogeneous workers without grouped feature has been studied in [13]. However, system solutions which reflect both of the two practical conditions−grouped structure and heterogeneity (in terms of number of workers in each group as well as the bandwidth of the communication links associated with the groups)−are yet to be established.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Coded Matrix Multiplication on a Group-Based Model

Kim

Sohn

Moon

2019

2019 IEEE International Symposium on Information Theory (ISIT)

Self Cite

View full text Add to dashboard Cite

Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the clustered or grouped structure of real-world computing servers. Neither the large variations in the computing power and bandwidth capabilities across different servers have been properly modeled. We suggest a group-based model to reflect practical conditions and develop an appropriate coding scheme for this model. The suggested code, called group code, employs parallel encoding for each group. We show that the suggested coding scheme can asymptotically achieve optimal computing time in regimes of infinite n, the number of workers. While theoretical analysis is conducted in the asymptotic regime, numerical results also show that the suggested scheme achieves near-optimal computing time for any finite but reasonably large n. Moreover, we demonstrate that the decoding complexity of the suggested scheme is significantly reduced by the virtue of parallel decoding.

show abstract

Distributed Gradient Descent with Coded Partial Gradient Computations

Ozfatura

Ulukuş

Gündüz

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Coded computation techniques provide robustness against straggling servers in distributed computing, with the following limitations: First, they increase decoding complexity. Second, they ignore computations carried out by straggling servers; and they are typically designed to recover the full gradient, and thus, cannot provide a balance between the accuracy of the gradient and per-iteration completion time.Here we introduce a hybrid approach, called coded partial gradient computation (CPGC), that benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and decoding complexity.

show abstract

Hierarchical Coding for Distributed Computing

Cited by 4 publications

References 0 publications

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

Coded Matrix Multiplication on a Group-Based Model

Distributed Gradient Descent with Coded Partial Gradient Computations

Contact Info

Product

Resources

About