SummaryThe high-performance conjugate gradients (HPCG) and high-performance geometric multi-grid (HPGMG) benchmarks are alternatives to the traditional LINPACK benchmark (HPL) in measuring the performance of modern HPC platforms. We performed HPCG and HPGMG benchmark tests on a Cray XE6/XK7 hybrid supercomputer, Blue Waters at National Center for Supercomputing Applications (NCSA). The benchmarks were tested on CPU-based and GPUenabled nodes separately, and then we analyzed characteristic parameters that affect their performance. Based on our analyses, we performed HPCG and HPGMG runs in multiple program, multiple data (MPMD) mode in Cray Linux Environment in order to measure their hybrid performance on both CPU-based and GPU-enabled nodes. We observed and analyzed several performance issues during those tests. Based on lessons learned from this study, we provide recommendations about how to optimize science applications on modern hybrid HPC platforms.
KEYWORDSGPU, heterogeneous computing, HPC benchmark, hybrid HPC platforms, MPMD mode
| INTRODUCTIONSince 1979, LINPACK benchmark (HPL) 1 has been used to measure the performance of HPC systems and the TOP500 list has been accordingly updated to rank the world's most powerful supercomputer. However, there have been many opinions criticizing whether HPL is a good metric to measure modern HPC performance or not. [2][3][4][5] One of the main criticisms is its low memory to flop (Byte/Flop) ratio. Byte/Flop ratios of many applications in molecular dynamics, weather forecasting, astrophysics, particle physics, structural analysis, and fluid dynamics are between 10 −1 and 10, while the ratio of HPL benchmark is less than 10 −3 . Since 2014, HPC community has shared the high-performance conjugate gradients (HPCG) 3,6 and high-performance geometric multi-grid (HPGMG) 4,7 benchmark results in order to represent modern HPC performance for many science and engineering HPC applications, at least, in terms of Byte/Flop ratios.This study is an extension of our continuing effort 8 to find out appropriate benchmarks for modern HPCs. In this study, we employed HPCG and HPGMG benchmarks to measure the performance of the Blue Waters system located at National Center for Supercomputing Application (NCSA). Blue Waters 9,10 is a Cray XE6/XK7 hybrid supercomputer with 22 640 CPU-based XE6 nodes and 4228 GPU-enabled XK7 nodes. The XE6 dual-socket nodes are populated with 2 AMD Interlargos model 6276 CPU processors with a nominal clock speed of at least 2.3 GHz and 64 GB of physical memory, while the XK7 accelerator nodes are equipped with one Interlagos model 6276 CPU processor and one NVIDIA GK110 "Kepler" accelerator K20X with 32 GB of CPU memory and 6 GB of GPU memory. We first tested HPCG and HPGMG on dual-socket CPU-based nodes and single-socket GPU-enabled nodes separately. After analyzing numerous configurations for the optimal performance, we moved on multiple program, multiple data (MPMD) runs to evaluate the performance on CPU-based XE6 nodes and GPU-enabled XK7 nodes together. ...