Preparing <scp>NERSC</scp> users for <scp>Cori</scp>, a <scp>Cray XC40</scp> system with <scp>Intel</scp> many integrated cores

He, Yun; Cook, Brandon; Deslippe, Jack; Friesen, Brian; Gerber, R.; Hartman-Baker, Rebecca; Koniges, Alice; Kurth, Thorsten; Leak, Stephen; Yang, Woo-Sun; Zhao, Zhengji; Baron, E.; Hauschildt, P. H.

doi:10.1002/cpe.4291

Cited by 13 publications

(15 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We tested the implementation on the Knights Landing partition of the National Energy Research Scientific Computing Center's Cori Cluster [35]. The partition employs 9688 nodes with a single-socket Intel Xeon Phi and a combined theoretical peak performance of 29.5 PFlop.…”

Section: Resultsmentioning

confidence: 99%

A UPC++ Actor Library and Its Evaluation On a Shallow Water Proxy Application

Pppl

Baden

Bäder

2019

2019 IEEE/ACM Parallel Applications Workshop, Alternatives to MPI (PAW-ATM)

View full text Add to dashboard Cite

Programmability is one of the key challenges of Exascale Computing. Using the actor model for distributed computations may be one solution. The actor model separates computation from communication while still enabling their overlap. Each actor possesses specified communication endpoints to publish and receive information. Computations are undertaken based on the data available on these channels. We present a library that implements this programming model using UPC++, a PGAS library, and evaluate three different parallelization strategies, one based on rank-sequential execution, one based on multiple threads in a rank, and one based on OpenMP tasks. In an evaluation of our library using shallow water proxy applications, our solution compares favorably against an earlier implementation based on X10, and a BSP-based approach.

show abstract

Section: Resultsmentioning

confidence: 99%

A UPC++ Actor Library and Its Evaluation On a Shallow Water Proxy Application

Pppl

Baden

Bäder

2019

2019 IEEE/ACM Parallel Applications Workshop, Alternatives to MPI (PAW-ATM)

View full text Add to dashboard Cite

show abstract

“…The primary system used for the experiments in this article is a Cray XC40 installation at the NERSC located in Berkeley, California, USA (He et al, 2018) known as Cori. Significant dedicated time on the Cori machine enabled the accurate scaling measurements presented here.…”

Section: Methodsmentioning

confidence: 99%

Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars

Heller

Lelbach

Huck

et al. 2019

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

We present a highly scalable demonstration of a portable asynchronous many-task programming model and runtime system applied to a grid-based adaptive mesh refinement hydrodynamic simulation of a double white dwarf merger with 14 levels of refinement that spans 17 orders of magnitude in astrophysical densities. The code uses the portable Cþþ parallel programming model that is embodied in the HPX library and being incorporated into the ISO Cþþ standard. The model represents a significant shift from existing bulk synchronous parallel programming models under consideration for exascale systems. Through the use of the Futurization technique, seemingly sequential code is transformed into wait-free asynchronous tasks. We demonstrate the potential of our model by showing results from strong scaling runs on National Energy Research Scientific Computing Center's Cori system (658,784 Intel Knight's Landing cores) that achieve a parallel efficiency of 96.8% using billions of asynchronous tasks.

show abstract

“…Hyper-threading could improve the application acceleration performance through increasing resource utilization by simultaneously running multiple threads/processes on the hardware threads on the core, making effective use of the cycles that would otherwise be wasted due to branch mis-predictions, data dependencies, cache misses, and/or waiting for other resources in a single thread/process execution on the core [43]. With the MIC, which provides four hardware threads per core, hyper-threading improved MCtandem’s performance slightly.…”

Section: Methodsmentioning

confidence: 99%

MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture

et al. 2019

BMC Bioinformatics

View full text Add to dashboard Cite

Background Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics. Results This paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem’s design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance. Conclusions For fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem .

show abstract

Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores

Cited by 13 publications

References 10 publications

A UPC++ Actor Library and Its Evaluation On a Shallow Water Proxy Application

A UPC++ Actor Library and Its Evaluation On a Shallow Water Proxy Application

Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars

MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture

Contact Info

Product

Resources

About