2021
DOI: 10.1007/s11704-021-0445-2
|View full text |Cite
|
Sign up to set email alerts
|

DRPS: efficient disk-resident parameter servers for distributed machine learning

Abstract: Parameter server (PS) as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied. However, existing PS-based systems often depend on memory implementations. With memory constraints, machine learning (ML) developers cannot train large-scale ML models in their rather small local clusters. Moreover, renting large-scale cloud servers is always economically infeasible for research teams and small companies. In this paper, we propose a disk-resident pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…In both asynchronous and synchronous training, aggregated gradients can be shared between GPUs through the two basic data-parallel training architectures: parameter server architecture and AllReduce architecture. Parameter server architecture [14] is a centralized architecture where all GPUs communicate to a dedicated GPU for gradients aggregation and updates. Alternately, AllReduce architecture [20] is a decentralized architecture where the GPUs share parameter updates in a ring network topology manner through the Allreduce operation.…”
Section: ) Data Parallelismmentioning
confidence: 99%
“…In both asynchronous and synchronous training, aggregated gradients can be shared between GPUs through the two basic data-parallel training architectures: parameter server architecture and AllReduce architecture. Parameter server architecture [14] is a centralized architecture where all GPUs communicate to a dedicated GPU for gradients aggregation and updates. Alternately, AllReduce architecture [20] is a decentralized architecture where the GPUs share parameter updates in a ring network topology manner through the Allreduce operation.…”
Section: ) Data Parallelismmentioning
confidence: 99%
“…(iii) Distribution of the new parameters among the workers, and retraining of the DNN[71]. To aggregate and update gradients, either a centralized architecture such as parameter server architecture[72], or a decentralized architecture such as All-Reduce[73] is used.…”
mentioning
confidence: 99%