2019
DOI: 10.1109/tsp.2019.2952051
|View full text |Cite
|
Sign up to set email alerts
|

Computation Scheduling for Distributed Machine Learning With Straggling Workers

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

2
8

Authors

Journals

citations
Cited by 58 publications
(29 citation statements)
references
References 21 publications
0
29
0
Order By: Relevance
“…More recently, some other streams of machine scheduling based on machine learning are also being studied. Research on the dynamic pricing algorithm based on reinforcement learning can work effectively without knowing the system dynamics information in advance, and the proposed energy scheduling algorithm can further reduce the system cost [81]. Another path for scheduling based on machine learning is about multi‐agent scheduling, which is currently very scarce.…”
Section: Resultsmentioning
confidence: 99%
“…More recently, some other streams of machine scheduling based on machine learning are also being studied. Research on the dynamic pricing algorithm based on reinforcement learning can work effectively without knowing the system dynamics information in advance, and the proposed energy scheduling algorithm can further reduce the system cost [81]. Another path for scheduling based on machine learning is about multi‐agent scheduling, which is currently very scarce.…”
Section: Resultsmentioning
confidence: 99%
“…It happens when large-sized models and datasets with multiple tasks are processed by on-device and battery-limited workers. In this case, as studied in [120], an effective solution could be scheduling the resultant stragglers while offloading their computationally demanding tasks (or even training data with a loss of privacy) to neighbors or edge servers, a conceptual design known as mobile edge computing (MEC) [121], [122]. Such task offloading in MEC needs to take into the account of device heterogeneity [123], communication limitations [124], [125], and demandsupply capabilities of processing power [126] in addition to its impact on the tolerable training latency [122] and target training/inference accuracy [127] while ensuring devices' privacy [128].…”
Section: Scheduling and Offloadingmentioning
confidence: 97%
“…In an edge setup with limited resources in communication and computation, this introduces training stragglers degrading the overall training performance. In this view, client scheduling [35]- [37] and computation offloading [38]- [40] with the focus on guaranteeing target training/inference accuracy have been identified as a promising research direction.…”
Section: B Distributed Learning Over Wireless Networkmentioning
confidence: 99%