2019 IEEE Global Communications Conference (GLOBECOM) 2019
DOI: 10.1109/globecom38437.2019.9014006
|View full text |Cite
|
Sign up to set email alerts
|

Heterogeneous Coded Computation across Heterogeneous Workers

Abstract: Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a coded distributed computing system, where multiple masters, each with a different matrix multiplication task, assign computation tasks to workers with heterogeneous computing capabilities. Both dedicated and probabilistic worker assignment models are considered, with the obj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 19 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…Future research directions include to consider coded computation techniques [15], [43]- [45] and task cancellation principles [26], [27], to further improve the efficiency of resource utilization, while guaranteeing the quality of service of computation tasks.…”
Section: Discussionmentioning
confidence: 99%
“…Future research directions include to consider coded computation techniques [15], [43]- [45] and task cancellation principles [26], [27], to further improve the efficiency of resource utilization, while guaranteeing the quality of service of computation tasks.…”
Section: Discussionmentioning
confidence: 99%
“…The statistical knowledge of the computation and communication latency for each worker can be acquired over time, and used for a more efficient allocation of computation tasks (e.g. as in [28], [35], [36]) as well as the coding scheme employed, for the sake of simplicity we assume homogeneous workers in this work.…”
Section: B Coded Distributed Matrix-vector Multiplicationmentioning
confidence: 99%
“…For example, experiments on Amazon EC2 instances show that some workers can be five times slower than the typical performance [2]. There have been several attempts in the literature to mitigate the straggler effect by adding redundancy to the distributed computing system via coding [3], [4] or via scheduling computation tasks [5], [6]. However, these works overlooked the inherent heterogeneity in the computing capacity of different workers.…”
Section: Introductionmentioning
confidence: 99%