2016
DOI: 10.48550/arxiv.1609.01507
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Extreme Scale-out SuperMUC Phase 2 - lessons learned

Nicolay Hammer,
Ferdinand Jamitzky,
Helmut Satzger
et al.

Abstract: We report lessons learned during the friendly user block operation period of the new system at the Leibniz Supercomputing Centre (SuperMUC Phase 2).

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1
1
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 5 publications
0
2
0
Order By: Relevance
“…More powerful supercomputers [1], [2] and advanced libraries [3], [4], [5], [6], [7] enable the training of ever more complex models on bigger data sets using advanced processing units such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) at increasing speeds and efficiency. HPC hardware is advancing both through infrastructure of supercomputers, such as Fugaku [8], Summit [1] or the SuperMUC-NG [9], and through its components, such as TPU pods [2], specifically designed to ease large scale neural network training for users. Concurrent software improvements in form of more efficient libraries such as Horovod [6] allow executing general purpose code on large distributed clusters with minor code changes.…”
Section: Introductionmentioning
confidence: 99%
“…More powerful supercomputers [1], [2] and advanced libraries [3], [4], [5], [6], [7] enable the training of ever more complex models on bigger data sets using advanced processing units such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) at increasing speeds and efficiency. HPC hardware is advancing both through infrastructure of supercomputers, such as Fugaku [8], Summit [1] or the SuperMUC-NG [9], and through its components, such as TPU pods [2], specifically designed to ease large scale neural network training for users. Concurrent software improvements in form of more efficient libraries such as Horovod [6] allow executing general purpose code on large distributed clusters with minor code changes.…”
Section: Introductionmentioning
confidence: 99%
“…To clarify the last point, let's consider the case of a very large cosmological simulation that was run within the LRZ Extreme Scaling Workhop in 2015 [14]. Such simulation (Magneticum Box0/mr) had 1.2 • 10 7 particles per node, each node was allocating 4GB for the Barnes Hut tree, 22GB for the basic quantities used in gravity (e.g.…”
Section: Memory Transfermentioning
confidence: 99%

Gadget3 on GPUs with OpenACC

Ragagnin,
Dolag,
Wagner
et al. 2020
Preprint
Self Cite