SC22: International Conference for High Performance Computing, Networking, Storage and Analysis 2022
DOI: 10.1109/sc41404.2022.00076
|View full text |Cite
|
Sign up to set email alerts
|

STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training

Abstract: Deep neural networks (DNNs) with billion-scale parameters have demonstrated impressive performance in solving many tasks. Unfortunately, training a billion-scale DNN is out of the reach of many data scientists because it requires highperformance GPU servers that are too expensive to purchase and maintain. We present STRONGHOLD, a novel approach for enabling large DNN model training with no change to the user code. STRONGHOLD scales up the largest trainable model size by dynamically offloading data to the CPU R… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 39 publications
0
1
0
Order By: Relevance
“…Stronghold [58] introduces a work window method, which keeps only part of the model's layers and parameters in the GPU. Under this mechanism, the GPU processes only the model layers within the work window, transferring the rest to the CPU.…”
Section: ) Memory Swapping Techniques In Optimizationmentioning
confidence: 99%
“…Stronghold [58] introduces a work window method, which keeps only part of the model's layers and parameters in the GPU. Under this mechanism, the GPU processes only the model layers within the work window, transferring the rest to the CPU.…”
Section: ) Memory Swapping Techniques In Optimizationmentioning
confidence: 99%