2018
DOI: 10.48550/arxiv.1810.08313
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD

Jianyu Wang,
Gauri Joshi

Abstract: Large-scale machine learning training, in particular, distributed stochastic gradient descent, needs to be robust to inherent system variability such as node straggling and random communication delays. This work considers a distributed training framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. We analyze the true speed of error convergence with respect to wall-clock time (instead of the number of iterations), and analyze how it is aff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(44 citation statements)
references
References 24 publications
0
44
0
Order By: Relevance
“…(Communication Pattern) A number of collective communication primitives can be used for data exchange between executors [70], such as Gather, AllReduce, and ScatterReduce. (Synchronization Protocol) The iterative nature of the optimization algorithms may imply certain dependencies across successive iterations, which force synchronizations between executors at certain boundary points [94]. A synchronization protocol has to be specified regarding when such synchronizations are necessary.…”
Section: Communication Mechanismmentioning
confidence: 99%
See 1 more Smart Citation
“…(Communication Pattern) A number of collective communication primitives can be used for data exchange between executors [70], such as Gather, AllReduce, and ScatterReduce. (Synchronization Protocol) The iterative nature of the optimization algorithms may imply certain dependencies across successive iterations, which force synchronizations between executors at certain boundary points [94]. A synchronization protocol has to be specified regarding when such synchronizations are necessary.…”
Section: Communication Mechanismmentioning
confidence: 99%
“…We have also used data parallelism to implement LambdaML. Other research topics in distributed ML include compression [6,7,52,53,93,96,97,101], decentralization [28,41,59,65,90,91,100], synchronization [4,19,26,46,66,68,87,94,102], straggler [8,56,83,89,98,105], data partition [1,3,36,55,77], etc.…”
Section: Related Workmentioning
confidence: 99%
“…• Fast aggregation via over-the-air computation [21], [84], [85], [86] • Aggregation frequency control with limited bandwidth and computation resources [87], [88], [89] • Data reshuffling via index coding and pliable index coding for improving training performance [90], [91], [92] • Straggler mitigation via coded computing [93], [94], [95], [96], [97], [98], [99], [100], [101] • Training in decentralized system mode [102], [103], [104], [105], [106], [107], [108], [109], [110], [111], [112] Model Partition Based Edge Training Systems…”
Section: Data Partition Based Edge Training Systemsmentioning
confidence: 99%
“…s=rτ +1 ∇L(Θ s ) when 0 ≥ r < j and Q r := rτ +i−1 s=rτ +1 ∇L(Θ s ) when r = j. Then, according to Equation (88) in [33], we have…”
Section: Convergence Analysis Of Dp-pasgdmentioning
confidence: 99%