30th IEEE International Performance Computing and Communications Conference 2011
DOI: 10.1109/pccc.2011.6108062
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems

Abstract: As storage systems get larger to meet the demands of petascale systems, careful planning must be applied to avoid congestion points and extract the maximum performance. In addition, the large data sets generated by such systems makes it desirable for all compute resources to have common access to this data without needing to copy it to each machine. This paper describes a method of placing I/O close to the storage nodes to minimize contention on Cray's SeaStar2+ network, and extends it to a routed Lustre confi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 13 publications
0
9
0
Order By: Relevance
“…We used half (120 GB/s) of the available storage from Spider (Widow1). The achievable aggregate I/O bandwidth is further limited due to congestion on the Cray 3D torus and the InfiniBand fabric, resulting from the Lustre routing algorithms in use during our measurement period [9].…”
Section: Output Absorption On Jaguarmentioning
confidence: 99%
See 1 more Smart Citation
“…We used half (120 GB/s) of the available storage from Spider (Widow1). The achievable aggregate I/O bandwidth is further limited due to congestion on the Cray 3D torus and the InfiniBand fabric, resulting from the Lustre routing algorithms in use during our measurement period [9].…”
Section: Output Absorption On Jaguarmentioning
confidence: 99%
“…This paper characterizes output burst absorption on Jaguar, a 2.33 petaflop Cray XK6 housed at the Oak Ridge Leadership Computing Center (OLCF) at Oak Ridge National Laboratory (ORNL). Storage for Jaguar is provided by Spider [9], the 10 petabyte, 240 GB/s Lustre [10] file system at OLCF. The key contribution of our study is to enhance understanding of performance behaviors for state-of-the art software as currently deployed in a leadership-class facility.…”
Section: Introductionmentioning
confidence: 99%
“…It totally contains 18,688 compute nodes and each node is powered by 2 quad-core AMD CPUs. The average I/O bandwidth of the whole system is about 80GB/s, and each node can achieve 4.67MB/s bandwidth [6]. In a production run of the GTC application at the scale of 16,384 cores on the platform of Jaguar XT5, the application would output 260GB of particle data every 120 seconds [20], with each core producing about 16.25MB per 120 seconds or 1.08MB data per second.…”
Section: Theoretical Analysis Resultsmentioning
confidence: 99%
“…In a production run at the scale of 16,384 cores, each core can output roughly two million particles per 120 seconds, resulting in 260GB of particle data per output (130MB per node) [20]. However, the average I/O throughput of its running platform, Jaguar (now Titan) at Oak Ridge National Laboratory, is around 4.7MB/s per node [6]. This difference presents a gap between the application's requirement and system capability.…”
Section: Gtc Fusion Modeling Codementioning
confidence: 99%
“…When an I/O request was issued, it was relayed multiple hops from peer compute nodes to I/O router, then went through SION network, Object Storage Server (OSS) and eventually arrived at OST. Despite of the high network bandwidth along the critical path, the extra data copy and data processing overhead at each hop caused additional delays [15]. Overall, the bandwidth utilization of one OST was 75.6% when there were only two concurrent processes, but dropped to 53.5% when there were 32 processes.…”
Section: Degraded Bandwidth Utilization Due To Contentionmentioning
confidence: 99%