Parallel I/O Performance Characterization of Columbia and NEC SX-8 Superclusters

Saini, Subhash; Talcott, Dale; Thakur, Rajeev; Adamidis, Panagiotis; Rabenseifner, Rolf; Ciotti, Robert

doi:10.1109/ipdps.2007.370289

Cited by 9 publications

(6 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…I/O is also important as most of the applications perform checkpointing, which requires fast writes. Sequential Read Write is a single process I/O benchmark that writes and reads a file using various block sizes [25].…”

Section: Sequential I/o Benchmarkmentioning

confidence: 99%

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Saini

Jin

Jespersen

et al. 2013

Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Self Cite

View full text Add to dashboard Cite

Intel recently introduced the Xeon Phi coprocessor based on the Many Integrated Core architecture featuring 60 cores with a peak performance of 1.0 Tflop/s. NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8-core Sandy Bridge processors along with two Xeon Phi 5110P coprocessors. We have conducted an early performance evaluation of the Xeon Phi. We used microbenchmarks to measure the latency and bandwidth of memory and interconnect, I/O rates, and the performance of OpenMP directives and MPI functions. We also used OpenMP and MPI versions of the NAS Parallel Benchmarks along with two production CFD applications to test four programming modes: offload, processor native, coprocessor native and symmetric (processor plus coprocessor). In this paper we present preliminary results based on our performance evaluation of various aspects of a Phi-based system.

show abstract

Section: Sequential I/o Benchmarkmentioning

confidence: 99%

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Saini

Jin

Jespersen

et al. 2013

Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Self Cite

View full text Add to dashboard Cite

show abstract

“…Saini et al used I/O benchmarks and applications on SGI Altix and NEC SX-8 super clusters [6]. Using the MADbench2 benchmark Borrill et al studied the I/O performance on several supercomputers ( [7][8].…”

Section: Introductionmentioning

confidence: 99%

I/O performance characterization of Lustre and NASA applications on Pleiades

Saini

Rappleye

Chang

et al. 2012

2012 19th International Conference on High Performance Computing

Self Cite

View full text Add to dashboard Cite

Abstract-In this paper we study the performance of the Lustre file system using five scientific and engineering applications representative of NASA workload on large-scale supercomputing systems such as NASA's Pleiades. In order to facilitate the collection of Lustre performance metrics, we have developed a software tool that exports a wide variety of client and server-side metrics using SGI's Performance CoPilot (PCP), and generates a human readable report on key metrics at the end of a batch job. These performance metrics are ( In this paper, we demonstrate the usefulness of this tool on Pleiades for five production quality NASA scientific and engineering applications. We compare the latency of read and write operations under Lustre to that with NFS by tracing system calls and signals. We also investigate the read and write policies and study the effect of page cache size on I/O operations. We examine the performance impact of Lustre stripe size and stripe count along with performance evaluation of file per process and single shared file accessed by all the processes for NASA workload using parameterized IOR benchmark.

show abstract

“…Several 110 benchmarks and a 110 intensive application were evaluated with their solution upto 1 k processors with reported bandwidth 2 GB/s. Saini et al [19] ran several 110 benchmarks and synthetic compact application benchmarks on Columbia and NEC SX-8 using upto 512 processors. They conclude that MPI-IO performance depends on access patterns and that 110 is not scalable when all processors access a shared file.…”

Section: Related Workmentioning

confidence: 99%

Scalable parallel I/O alternatives for massively parallel partitioned solver systems

Liu

Sahni

et al. 2010

2010 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum (IPDPSW)

View full text Add to dashboard Cite

With the development of high-performance comput ing, 110 issues have become the bottleneck for many massively parallel applications. This paper investigates scalable parallel 110 alternatives for massively parallel partitioned solver systems.Ty pically such systems have synchronized "loops" and will write data in a well defined block 110 format consisting of a header and data portion. Our target use for such an parallel 110 subsystem is checkpoint-restart where writing is by far the most common operation and reading typically only happens during either initialization or during a restart operation because of a system failure. We compare four parallel 110 strategies: 1 POSIX File Per Processor (lPFPP), a synchronized parallel 10 library (synclO), "Poor-Man's" Parallel 110 (PMPIO) and a new "reduced blocking" strategy (rbIO). Performance tests using real CFD solver data from PHASTA (an unstructured grid finite element Navier-Stokes solver [1]) show that the synclO strategy can achieve a read bandwidth of 6.6GB/Sec on Blue Gene/L using 16K processors which is significantly faster than 1PFPP or PMPIO approaches. The serial "token-passing" approach of PMPIO yields a 900MB/sec write bandwidth on 16K processors using 1024 files and 1PFPP achieves 600 MB/sec on 8K processors while the "reduced-blocked" rblO strategy achieves an actual writing performance of 2.3GB/sec and perceivedliatency hiding writing performance of more than 21,000 GB/sec (i.e., 21TB/sec) on a 32,768 processor Blue Gene/L.

show abstract

Parallel I/O Performance Characterization of Columbia and NEC SX-8 Superclusters

Cited by 9 publications

References 3 publications

An early performance evaluation of many integrated core architecture based SGI rackable computing system

An early performance evaluation of many integrated core architecture based SGI rackable computing system

I/O performance characterization of Lustre and NASA applications on Pleiades

Scalable parallel I/O alternatives for massively parallel partitioned solver systems

Contact Info

Product

Resources

About