SC14: International Conference for High Performance Computing, Networking, Storage and Analysis 2014
DOI: 10.1109/sc.2014.23
|View full text |Cite
|
Sign up to set email alerts
|

Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
13
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
3
2

Relationship

4
5

Authors

Journals

citations
Cited by 37 publications
(15 citation statements)
references
References 5 publications
2
13
0
Order By: Relevance
“…Titan's ile system, called Spider 2, is based on Lustre, an object-based parallel ile system software that is deployed on ∼75% of the top 100 systems [36]. Spider 2 has 32 PB of data storage and above 1TB/s peak I/O bandwidth [31]. This section summarizes Titan/Spider 2 based on materials from [13,30,39,40].…”
Section: Titan and Its Lustre File Systemmentioning
confidence: 99%
“…Titan's ile system, called Spider 2, is based on Lustre, an object-based parallel ile system software that is deployed on ∼75% of the top 100 systems [36]. Spider 2 has 32 PB of data storage and above 1TB/s peak I/O bandwidth [31]. This section summarizes Titan/Spider 2 based on materials from [13,30,39,40].…”
Section: Titan and Its Lustre File Systemmentioning
confidence: 99%
“…Even though fprof is designed to run with multiple processes on multiple nodes to scale, there exist practical concerns and constraints for deploying and running on a production system. For instance, due to the architecture of centralized metadata management in Lustre [22], excessive metadata scanning operations might adversely impact the foreground file system operations. To this end, OLCF ran fprof using a single client node for profiling the Lustrebased Spider II file system, while at LC, fprof was run on multiple nodes, resulting in a significant performance improvement.…”
Section: Deploymentmentioning
confidence: 99%
“…We ran fprof on the OLCF's center-wide Spider II file system [22] and the lscratche file system in LC [6] in May 2017. Note the difference in file system architectures of the two HPC centers outlined in Table 1.…”
Section: Profiling and Analysismentioning
confidence: 99%
“…Extrapolating from here, the expected Spider 2 performance should be at most 250 GB/s under such bursty production workloads. [10].…”
Section: Spider 2 Usage Statsmentioning
confidence: 99%