Evaluating and Optimizing the NERSC Workload on Knights Landing

Barnes, Taylor; Cook, Brandon; Deslippe, Jack; Doerfler, Douglas; Friesen, Brian; He, Yun; Kurth, Thorsten; Koskela, T.; Lobet, Mathieu; Malas, Tareq B.; Oliker, Leonid; Ovsyannikov, Andrey; Sarje, Abhinav; Vay, Jean‐Luc; Vincenti, Henri; Williams, Samuel; Carrier, Pierre; Wichmann, Nathan; Wagner, Marcus; Kent, Paul; Kerr, Christopher; Dennis, John M.

doi:10.1109/pmbs.2016.010

Cited by 27 publications

(30 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rosales et al [26] study the performance differences observed when using flat, cache and hybrid HBM configurations together with the effect of memory affinity and process pinning in Mantevo suite and NAS parallel benchmark. Barnes et al [27] present the initial performance results of 20 NESAP scientific applications running on KNL nodes of the Cori system and comparing KNL hardware features with traditional Intel Xeon architectures. This study mainly targets at how to effectively run NESAP applications in the Cori system whereas we focus on giving general guidelines on what kind of applications characteristics benefit from running on a hybrid memory system.…”

Section: Related Workmentioning

confidence: 99%

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Gioiosa

Kestor

Cicotti

et al. 2017

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

Abstract-Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Theoretically, HBM can provide ∼ 4× higher bandwidth than conventional DRAM. However, many factors impact the effective performance achieved by applications, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific and data-analytics workloads. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to 3× performance when compared to the performance obtained using only DRAM. On the contrary, applications with random memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. For those applications, the use of additional hardware threads may help hide latency and achieve higher aggregated bandwidth when using HBM.

show abstract

Section: Related Workmentioning

confidence: 99%

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Gioiosa

Kestor

Cicotti

et al. 2017

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

show abstract

“…The NESAP optimization efforts and results are documented in more detail in Barnes et al 16 and Kurth et al 17 3 | HETEROGENEOUS PROGRAMMING ENVIRONMENT The heterogeneous Haswell/KNL system is considerably more complex than a homogeneous Xeon cluster. We describe some challenges encountered and the recommendations we formed around building applications with cross-compilation and binary compatibility.…”

Section: Nesap Resultsmentioning

confidence: 99%

“…The NESAP optimization efforts and results are documented in more detail in Barnes et al and Kurth et al…”

Section: Nesapmentioning

confidence: 99%

Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores

Cook

Deslippe

et al. 2017

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

SummaryThe newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon-Phi "Knights Landing" (KNL) nodes. Compared to the Xeon-based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine-grain parallelization; vectorization; and use of the high-bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Exascale Science Application Program, Web documentation, and user training. We discuss how we configured the Cori system for usability and productivity, addressing programming concerns, batch system configurations, and default KNL cluster and memory modes. System usage data, job completion analysis, programming and running jobs issues, and a few successful user stories on KNL are presented.

show abstract

“…Unlike GPUs and FPGAs, which mandate that code developers use specialized programming models, KNL facilitates application development and porting by providing standard language support (C, C++, Fortran, etc), familiar parallel programming models, and extensive compiler support. However, to obtain maximum performance on the KNL, significant refactoring and optimization of application codes are required to exploit key architectural innovations that KNL features—wide vector units, many‐core node design, and deep memory hierarchy . In this paper, the experience and insights gained in porting FEFLO (finite element code for the solution of compressible and incompressible flows) to the KNL platform are presented.…”

Section: Introductionmentioning

confidence: 99%

“…However, to obtain maximum performance on the KNL, significant refactoring and optimization of application codes are required to exploit key architectural innovations that KNL features-wide vector units, many-core node design, and deep memory hierarchy. [1][2][3][4][5][6] In this paper, the experience and insights gained in porting FEFLO (finite element code for the solution of compressible and incompressible flows) to the KNL platform are presented. FEFLO is a typical large-scale, production legacy code that has previously been ported and run on vector and GPU hardware.…”

Section: Introductionmentioning

confidence: 99%

Running large‐scale CFD applications on Intel‐KNL–based clusters

Tiwari

Cauble-Chantrenne

Jundt

et al. 2017

Numerical Methods in Fluids

View full text Add to dashboard Cite

Summary Intel's latest Xeon Phi processor, Knights Landing (KNL), has the potential to provide over 2.6 TFLOPS. However, to obtain maximum performance on the KNL, significant refactoring and optimization of application codes are still required to exploit key architectural innovations that KNL features—wide vector units, many‐core node design, and deep memory hierarchy. The experience and insights gained in porting and running FEFLO (a typical edge‐based finite element code for the solution of compressible and incompressible flows) on the KNL platform are described in this paper. In particular, optimizations used to extract on‐node parallelism via vectorization and multithreading and improve internode communication are considered. These optimizations resulted in a 2.3× performance gain on a 16 node runs of FEFLO, with the potential for larger performance gains as the code is scaled beyond 16 nodes. The impact of the different configurations of KNL's on‐package MCDRAM (Multi‐Channel DRAM) memory on FEFLO's performance is also explored. Finally, the performance of the optimized versions of FEFLO for KNL and Haswell (Intel Xeon) is compared.

show abstract

Evaluating and Optimizing the NERSC Workload on Knights Landing

Cited by 27 publications

References 35 publications

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores

Running large‐scale CFD applications on Intel‐KNL–based clusters

Contact Info

Product

Resources

About