Symmetric Memory Partitions in OpenSHMEM: A Case Study with Intel KNL

Namashivayam, Naveen; Cernohous, Bob; Kandalla, Krishna; Pou, Dan; Robichaux, Joseph; Dinan, James; Pagel, Mark

doi:10.1007/978-3-319-73814-7_1

Cited by 7 publications

(1 citation statement)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dorylus exploits the workload characteristics of graph neural networks (GNNs) to use parallel serverless CPU threads for model training [57]. Dorylus and other memory [58] or computation [59] optimization techniques can be used in combination with STRONGHOLD to utilize low-cost CPU threads to train GNNs. Furthermore, STRONGHOLD can also be used together with asynchronous training [60] to further reduce the waiting time across training epochs, but care must be taken to avoid slowing down model convergence [61].…”

Section: Further Analysis 1) Training Efficiencymentioning

confidence: 99%

STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training

Sun

Wang

Qiu

et al. 2022

SC22: International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Deep neural networks (DNNs) with billion-scale parameters have demonstrated impressive performance in solving many tasks. Unfortunately, training a billion-scale DNN is out of the reach of many data scientists because it requires highperformance GPU servers that are too expensive to purchase and maintain. We present STRONGHOLD, a novel approach for enabling large DNN model training with no change to the user code. STRONGHOLD scales up the largest trainable model size by dynamically offloading data to the CPU RAM and enabling the use of secondary storage. It automatically determines the minimum amount of data to be kept in the GPU memory to minimize GPU memory usage. Compared to state-of-the-art offloading-based solutions, STRONGHOLD improves the trainable model size by 1.9x∼6.5x on a 32GB V100 GPU, with 1.2x∼3.7x improvement on the training throughput. It has been deployed into production to successfully support large-scale DNN training.

show abstract