John Baron scite author profile

SUMMARYThe SPEC High-Performance Group has developed the benchmark suite SPEC MPI2007 and its run rules over the last few years. The purpose of the SPEC MPI2007 benchmark and its run rules is to further the cause of fair and objective benchmarking of high-performance computing systems. The rules help to ensure that the published results are meaningful, comparable to other results, and reproducible. MPI2007 includes 13 technical computing applications from the fields of computational fluid dynamics, molecular dynamics, electromagnetism, geophysics, ray tracing, and hydrodynamics. We describe the benchmark suite, and compare it with other benchmark suites.

show abstract

Performance Evaluation of an Intel Haswell-and Ivy Bridge-Based Supercomputer Using Scientific and Engineering Applications

Saini

Hood²,

Chang³

et al. 2016

View full text Add to dashboard Cite

We present a performance evaluation conducted on a production supercomputer of the Intel Xeon Processor E5-2680v3, a twelve-core implementation of the fourth-generation Haswell architecture, and compare it with Intel Xeon Processor E5-2680v2, an Ivy Bridge implementation of the third-generation Sandy Bridge architecture. Several new architectural features have been incorporated in Haswell including improvements in all levels of the memory hierarchy as well as improvements to vector instructions and power management. We critically evaluate these new features of Haswell and compare with Ivy Bridge using several low-level benchmarks including subset of HPCC, HPCG and four full-scale scientific and engineering applications. We also present a model to predict the performance of HPCG and Cart3D within 5%, and Overflow within 10% accuracy.

show abstract

NUMA-aware Scalable Graph Traversal on SGI UV Systems

Yasui

Goh

Baron

et al. 2016

View full text Add to dashboard Cite

Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's directionoptimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 2 34 vertices and 2 38 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a sharedmemory system on the current Graph500 list presented in November 2015, which includes our previous implementation.

show abstract

MPInside

Thomas¹,

Panziera²,

Baron³

2010

View full text Add to dashboard Cite

Performance analysis and prediction of parallel applications using the Message-Passing Interface (MPI) standard is a challenging task. Collecting, organizing, and making sense of profiling data for MPI jobs of even modest scale is difficult and timeconsuming. The task is further complicated by the inherent difficulty in interpreting the resulting communication measurements. In this paper we introduce MPInside, a new profiling and diagnostic tool that overcomes these constraints with carefully considered choices for measurement techniques, capabilities, and output formats. Using examples from real-world applications, we illustrate the innovative features of the toolincluding late senders for point-to-point calls and unaligned collective calls-all in an instrumentation-free framework. We also demonstrate the in-flight modeling capabilities of MPInside with several "what if" experiments.The MPInside project began as an investigation on parallel applications using the MPI standard. With a classical profiling tool, one measures the time an application spends in the user code versus the MPI library. Usually, when the computation time dominates, the application scales well. On the other hand, a large percentage of communication time typically indicates a poor parallel efficiency. One would naively believe that better communication hardware directly translates into reduced communication time and better parallel efficiency.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John Baron

SPEC OMP2012 — An Application Benchmark Suite for Parallel Systems Using OpenMP

SPEC MPI2007—an application benchmark suite for parallel systems using MPI

Performance Evaluation of an Intel Haswell-and Ivy Bridge-Based Supercomputer Using Scientific and Engineering Applications

NUMA-aware Scalable Graph Traversal on SGI UV Systems

MPInside

Contact Info

Product

Resources

About