2017
DOI: 10.1002/cpe.4291
|View full text |Cite
|
Sign up to set email alerts
|

Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores

Abstract: SummaryThe newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon-Phi "Knights Landing" (KNL) nodes. Compared to the Xeon-based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine-grain parallelization; vectorization; and use of the high-bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 10 publications
0
15
0
Order By: Relevance
“…We tested the implementation on the Knights Landing partition of the National Energy Research Scientific Computing Center's Cori Cluster [35]. The partition employs 9688 nodes with a single-socket Intel Xeon Phi and a combined theoretical peak performance of 29.5 PFlop.…”
Section: Resultsmentioning
confidence: 99%
“…We tested the implementation on the Knights Landing partition of the National Energy Research Scientific Computing Center's Cori Cluster [35]. The partition employs 9688 nodes with a single-socket Intel Xeon Phi and a combined theoretical peak performance of 29.5 PFlop.…”
Section: Resultsmentioning
confidence: 99%
“…The primary system used for the experiments in this article is a Cray XC40 installation at the NERSC located in Berkeley, California, USA (He et al, 2018) known as Cori. Significant dedicated time on the Cori machine enabled the accurate scaling measurements presented here.…”
Section: Methodsmentioning
confidence: 99%
“…Hyper-threading could improve the application acceleration performance through increasing resource utilization by simultaneously running multiple threads/processes on the hardware threads on the core, making effective use of the cycles that would otherwise be wasted due to branch mis-predictions, data dependencies, cache misses, and/or waiting for other resources in a single thread/process execution on the core [43]. With the MIC, which provides four hardware threads per core, hyper-threading improved MCtandem’s performance slightly.…”
Section: Methodsmentioning
confidence: 99%