2019
DOI: 10.48550/arxiv.1909.02061
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Performance Analysis and Comparison of Distributed Machine Learning Systems

Abstract: Deep learning has permeated through many aspects of computing/processing systems in recent years. While distributed training architectures/frameworks are adopted for training large deep learning models quickly, there has not been a systematic study of the communication bottlenecks of these architectures and their effects on the computation cycle time and scalability. In order to analyze this problem for synchronous Stochastic Gradient Descent (SGD) training of deep learning models, we developed a performance m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…Various methods for distributing Machine Learning workloads have been discussed in the literature [1] and most Machine Learning (ML) frameworks provide consistent APIs implementing multiple distribution schemes through a consistent interface. This section highlights some common distribution paradigms focusing on the techniques used to scale DeepWalk using commodity hardware (which we refer to as HUGE-CPU) and TPUs (HUGE-TPU).…”
Section: Common ML Distribution Strategiesmentioning
confidence: 99%
“…Various methods for distributing Machine Learning workloads have been discussed in the literature [1] and most Machine Learning (ML) frameworks provide consistent APIs implementing multiple distribution schemes through a consistent interface. This section highlights some common distribution paradigms focusing on the techniques used to scale DeepWalk using commodity hardware (which we refer to as HUGE-CPU) and TPUs (HUGE-TPU).…”
Section: Common ML Distribution Strategiesmentioning
confidence: 99%
“…In turn, the computed gradients are passed back to the PS to be used to update the weights again. In spite of its simplicity, this architecture shows poor scalability because all workers should communicate with the PS, and thus the PS easily becomes a bottleneck when there are a large number of workers in the cluster [4], [5], [18].…”
Section: A Distributed Dnn Trainingmentioning
confidence: 99%
“…Each worker applies this aggregated gradient to its weight for the next iteration. Since communication occurs only between neighboring workers, network traffic is decentralized and, consequently, higher scalability can be obtained [5].…”
Section: A Distributed Dnn Trainingmentioning
confidence: 99%
See 1 more Smart Citation