Proceedings of the Seventh ACM Symposium on Cloud Computing 2016
DOI: 10.1145/2987550.2987586
|View full text |Cite
|
Sign up to set email alerts
|

Ako

Abstract: Distributed systems for the training of deep neural networks (DNNs) with large amounts of data have vastly improved the accuracy of machine learning models for image and speech recognition. DNN systems scale to large cluster deployments by having worker nodes train many model replicas in parallel; to ensure model convergence, parameter servers periodically synchronise the replicas. This raises the challenge of how to split resources between workers and parameter servers so that the cluster CPU and network reso… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 61 publications
references
References 27 publications
0
0
0
Order By: Relevance