Performance Analysis of a Hybrid Overset Multi-block Application on Multiple Architectures

Djomehri, M. Jahed; Biswas, Rupak

doi:10.1007/978-3-540-24596-4_41

Cited by 4 publications

(5 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Scalability on Seaborg exceeded all others, with computational efficiency decreasing for a larger number of MPI tasks primarily due to load imbalance. A hybrid MPI+OpenMP implementation [9] showed similar performance as a pure MPI approach on all systems except for Seaborg where the multilevel results were significantly better. Note that adding more OpenMP threads beyond an optimal number, depending on the number of MPI processes, does not improve performance.…”

Section: Fluid Dynamicsmentioning

confidence: 84%

5. Performance Evaluation and Modeling of Ultra-Scale Systems

Oliker¹,

Biswas²,

Wijngaart³

et al. 2006

Parallel Processing for Scientific Computing

Self Cite

View full text Add to dashboard Cite

The growing gap between sustained and peak performance for full-scale complex scientific applications on conventional supercomputers is a major concern in high performance computing (HPC). The problem is expected to be exacerbated by the end of this decade, as mission-critical applications will have computational requirements that are at least two orders of magnitude larger than current levels. In order to continuously increase raw computational power and at the same time substantially reap its benefits, major strides are necessary in hardware architecture, software infrastructure, and application development. The first step toward this goal is the accurate assessment of existing and emerging HPC systems across a comprehensive set of scientific algorithms. In addition, high-fidelity performance modeling is required to understand and predict the complex interactions among hardware, software, and applications, and thereby influence future design trade-offs. This survey article discusses recent performance evaluations of state-of-the-art ultra-scale systems for a diverse set of scientific applications, including scalable compact synthetic benchmarks and architectural probes. In addition, performance models and program characterizations from key scientific areas are described.

show abstract

Section: Fluid Dynamicsmentioning

confidence: 84%

5. Performance Evaluation and Modeling of Ultra-Scale Systems

Oliker¹,

Biswas²,

Wijngaart³

et al. 2006

Parallel Processing for Scientific Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…The hybrid MPI+OpenMP version of OVERFLOW-D takes advantage of the overset grid system, which offers a natural coarse-grain parallelism [5]. A binpacking algorithm clusters individual grids into groups, each of which is then assigned to an MPI process.…”

Section: Overflow-d: Rotor Vortex Simulationsmentioning

confidence: 99%

“…Because Columbia is a cache-based superscalar architecture, modifications were necessary to improve performance. The linear solver of the application, called LU-SGS, was reimplemented using a pipeline algorithm [5] to enhance efficiency which is dictated by the type of data dependencies inherent in the solution algorithm.…”

Section: Overflow-d: Rotor Vortex Simulationsmentioning

confidence: 99%

An Application-Based Performance Characterization of the Columbia Supercluster

Biswas

Djomehri

Hood

et al.

ACM/IEEE SC 2005 Conference (SC'05)

View full text Add to dashboard Cite

Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as one of the fastest computers in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floatingpoint performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and InfiniBand interconnects hold promise for multi-node application scaling to at least 2048 processors.

show abstract

“…The hybrid MPI+OpenMP version of OVERFLOW-D takes advantage of the overset grid system, which offers a natural coarse-grain parallelism. 8 A bin-packing algorithm clusters individual grids into groups, each of which is then assigned to an MPI process. The grouping strategy uses a connectivity test that inspects for an overlap between a pair of grids before assigning them to the same group, regardless of the size of the boundary data or their connectivity to other grids.…”

Section: Overflow-d: Rotor Vortex Simulationsmentioning

confidence: 99%

“…Because Columbia is a cache-based superscalar architecture, modifications were necessary to improve performance. The linear solver of the application, called LU-SGS, was re-implemented using a pipeline algorithm 8 to enhance efficiency which is dictated by the type of data dependencies inherent in the solution algorithm.…”

Section: Overflow-d: Rotor Vortex Simulationsmentioning

confidence: 99%

A Detailed Performance Characterization of Columbia using Aeronautics Benchmarks and Applications

Aftosmis

Berger

Biswas

et al. 2006

44th AIAA Aerospace Sciences Meeting and Exhibit

View full text Add to dashboard Cite

Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as one of the fastest computers in the world. In this paper, we investigate its suitability as a capability computing platform for aeronautics applications. We present the performance characteristics of Columbia obtained on up to eight computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. To perform the assessment, we used a subset of the NAS Parallel Benchmarks, including the multi-zone versions, and three computational fluid dynamics applications of interest to NASA. Our results show that the system holds promise for multinode application scaling to at least 4096 processors.

show abstract

Performance Analysis of a Hybrid Overset Multi-block Application on Multiple Architectures

Cited by 4 publications

References 5 publications

5. Performance Evaluation and Modeling of Ultra-Scale Systems

5. Performance Evaluation and Modeling of Ultra-Scale Systems

An Application-Based Performance Characterization of the Columbia Supercluster

A Detailed Performance Characterization of Columbia using Aeronautics Benchmarks and Applications

Contact Info

Product

Resources

About