Toward a new metric for ranking high performance computing systems.

Report, Sandia; Dongarra, Jack; Heroux, Michael A.

doi:10.2172/1089988

Cited by 90 publications

(33 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We are working with the community to gradually unify existing techniques and tools including pragma-based source-to-source transformations [41,80], plugin-based GCC and LLVM to expose and tune all internal optimization decisions [30,31]; polyhedral source-to-source transformation tools [12]; differential analysis to detect performance anomalies and CPU/memory bounds [28,36]; just-in-time compilation for Android Dalvik or Oracle JDK; algorithmlevel tuning [3]; techniques to balance communication and computation in numerical codes particularly for heterogeneous architectures [7,75]; Scalasca framework to automate analysis and modeling of scalability of HPC applications [13,40]; LIKWID for lightweight collection of hardware counters [76]; HPCC and HPCG benchmarks to collaboratively rank HPC systems [42,56]; benchmarks from GCC and LLVM, TAU performance tuning framework [68]; and all recent Periscope application tuning plugins [10,60].…”

Section: Discussionmentioning

confidence: 99%

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Fursin

Miceli

Lokhmotov

et al. 2014

Scientific Programming

View full text Add to dashboard Cite

Abstract. Empirical auto-tuning and machine learning techniques have been showing high potential to improve execution time, power consumption, code size, reliability and other important metrics of various applications for more than two decades. However, they are still far from widespread production use due to lack of native support for auto-tuning in an ever changing and complex software and hardware stack, large and multi-dimensional optimization spaces, excessively long exploration times, and lack of unified mechanisms for preserving and sharing of optimization knowledge and research material.We present a possible collaborative approach to solve above problems using Collective Mind knowledge management system. In contrast with previous cTuning framework, this modular infrastructure allows to preserve and share through the Internet the whole auto-tuning setups with all related artifacts and their software and hardware dependencies besides just performance data. It also allows to gradually structure, systematize and describe all available research material including tools, benchmarks, data sets, search strategies and machine learning models. Researchers can take advantage of shared components and data with extensible meta-description to quickly and collaboratively validate and improve existing auto-tuning and benchmarking techniques or prototype new ones. The community can now gradually learn and improve complex behavior of all existing computer systems while exposing behavior anomalies or model mispredictions to an interdisciplinary community in a reproducible way for further analysis. We present several practical, collaborative and model-driven auto-tuning scenarios. We also decided to release all material at c-mind.org/repo to set up an example for a collaborative and reproducible research as well as our new publication model in computer engineering where experimental results are continuously shared and validated by the community.

show abstract

Section: Discussionmentioning

confidence: 99%

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Fursin

Miceli

Lokhmotov

et al. 2014

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…However, Linpack only reflects one aspect of computing platforms. That is why other tests were suggested later which became the basis for the Graph500 [11] and HPCG [12] benchmarks. All three ratings use the same technique: a basic algorithm is chosen and its software implementation is written and executed on each computing system in question, which results in a number that is used to judge the computer's properties.…”

Section: The Algowiki Project and Top500 Methodologymentioning

confidence: 99%

AlgoWiki Project as an Extension of the Top500 Methodology

2018

JSFI

View full text Add to dashboard Cite

The AlgoWiki project is dedicated to describing the parallel structure and key features of various algorithms. The descriptions are intended to provide complete information about algorithms' properties, which are needed to adequately assess their implementation efficiency for any computing platform. This work sets out the key areas for further development of the project which were recently developed based on working with the AlgoWiki encyclopedia. We are suggesting an approach to extend the Top500 methodology, which is commonly used to compare various computing platforms.

show abstract

“…In this work, we demonstrate such an optimization for a data mining algorithm which solves regression and classification problems on vast data sets. Finally, the proposal of having an additional ranking of the Top500 list machines (like the Green500 [13] list with respect to power consumption) based on a high-performance CG (HPCG) implementation was recently made [14].In this work, we apply the idea of an HPC benchmark to a full and relevant application, classification and regression of vast data sets. By processing data sets ranging from several hundreds of thousands instances to multi-million data points in strong-scaling and weak-scaling settings, we are able to estimate the amount of parallelism needed to unleash the performance of classic CPU-based machines and clusters employing Intel Xeon Phi coprocessors and NVIDIA Kepler GPUs.…”

mentioning

confidence: 99%

“…In case of accelerated clusters, the scalable heterogeneous computing benchmark suite [10] is a good candidate which implements nearly all NAS benchmarks in OpenCL and CUDA, and can be easily executed on accelerators and GPUs. Finally, the proposal of having an additional ranking of the Top500 list machines (like the Green500 [13] list with respect to power consumption) based on a high-performance CG (HPCG) implementation was recently made [14]. There, the benchmarks are not limited to kernels; they are simplified versions of real simulation codes stemming from several application domains.…”

mentioning

confidence: 99%

“…A similar approach was chosen for the procurement of the latest peta-scale system in Germany, called 'Super-MUC' at the Leibniz Supercomputing Centre: according to Brehm [12], 45% of the benchmarks required during this process were full applications. Finally, the proposal of having an additional ranking of the Top500 list machines (like the Green500 [13] list with respect to power consumption) based on a high-performance CG (HPCG) implementation was recently made [14].In this work, we apply the idea of an HPC benchmark to a full and relevant application, classification and regression of vast data sets. It exhibits different and distinct properties than the benchmarks discussed earlier, poses additional challenges to current and future HPC systems, and we thus propose it as a further extension of an application benchmark portfolio.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Data mining on vast data sets as a cluster system benchmark

Heinecke

Karlstetter

Pflüger

et al. 2015

Concurrency and Computation

View full text Add to dashboard Cite

Comparing different (accelerated) cluster architectures by a single application is a tough piece of work because this application has to be optimized with respect to platform-dependent features. In this work, we demonstrate such an optimization for a data mining algorithm which solves regression and classification problems on vast data sets. Our technique is based on least squares regression, and its major component is the iterative matrix-free solution of a linear system of equations. By processing data sets ranging from several hundreds of thousands instances to multi-million data points in strong-scaling and weak-scaling settings, we are able to estimate the amount of parallelism needed to unleash the performance of classic CPU-based machines and clusters employing Intel Xeon Phi coprocessors and NVIDIA Kepler GPUs. Only in strongscaling experiments, GPUs and coprocessors suffer from their tremendous amount of needed parallelism and get outperformed by dual socket Intel Sandy Bridge nodes at large scale (more than 64 nodes/accelerators). However, in weak-scaling scenarios, a speed-up larger than 2X over an entire CPU node can be achieved by a single accelerator.A. HEINECKE ET. AL.(NAS) Division parallel benchmark suite [9], which requires several application kernels to be run (including iterative solvers and fast Fourier transforms). In case of accelerated clusters, the scalable heterogeneous computing benchmark suite [10] is a good candidate which implements nearly all NAS benchmarks in OpenCL and CUDA, and can be easily executed on accelerators and GPUs. However, research and procurements performed in recent years have demonstrated that even running (just) application kernels might not be sufficient: Sandia Labs highlighted how mini-applications or proxy-applications can be used in order to understand the performance of a supercomputer and even influence its future development [11]. There, the benchmarks are not limited to kernels; they are simplified versions of real simulation codes stemming from several application domains. A similar approach was chosen for the procurement of the latest peta-scale system in Germany, called 'Super-MUC' at the Leibniz Supercomputing Centre: according to Brehm [12], 45% of the benchmarks required during this process were full applications. Finally, the proposal of having an additional ranking of the Top500 list machines (like the Green500 [13] list with respect to power consumption) based on a high-performance CG (HPCG) implementation was recently made [14].In this work, we apply the idea of an HPC benchmark to a full and relevant application, classification and regression of vast data sets. It exhibits different and distinct properties than the benchmarks discussed earlier, poses additional challenges to current and future HPC systems, and we thus propose it as a further extension of an application benchmark portfolio. Furthermore, we demonstrate its use to benchmark different clusters and supercomputers. A fair application-driven comparison is ensured by optimizing our dat...

show abstract

Toward a new metric for ranking high performance computing systems.

Cited by 90 publications

References 3 publications

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Collective Mind: Towards Practical and Collaborative Auto-Tuning

AlgoWiki Project as an Extension of the Top500 Methodology

Data mining on vast data sets as a cluster system benchmark

Contact Info

Product

Resources

About