To meet enterprise and grand challenge-scale performance and interoperability requirements, a group of engineers-initially ad-hoc but now integrated into the IETF-is designing extensions to NFSv4 that provide parallel access to storage systems. This paper gives an overview of pNFS, an emerging NFSv4 extension that promises file access scalability plus operating system and storage system independence. pNFS bypasses the server bottleneck by enabling direct access to storage by NFSv4 clients and by providing a framework for the co-existence of NFSv4 with other file access protocols. In this paper, we describe an implementation that demonstrates and validates pNFS' potential. The I/O throughput of our prototype matches that of its exported file system and far exceeds standard NFSv4.
The complexity of cloud-based analytics environments threatens to undermine their otherwise tremendous values. In particular, configuring such environments presents a great challenge. We propose to alleviate this issue with an engine that recommends configurations for a newly submitted analytics job in an intelligent and timely manner. The engine is rooted in a modified k-nearest neighbor algorithm, which finds desirable configurations from similar past jobs that have performed well. We apply the method to configuring an important class of analytics environments: Hadoop on container-driven clouds. Preliminary evaluation suggests up to 28% performance gain could result from our method.
Workload characterization studies highlight the prevalence of small and sequential data requests in scientific applications. Parallel file systems excel at large data transfers but sometimes at the expense of small I/O performance. pNFS is an NFSv4.1 high-performance enhancement that provides direct storage access to parallel file systems while preserving NFSv4 operating system and hardware platform independence. This paper demonstrates that distributed file systems can increase write throughput to parallel data stores-regardless of file size-by overcoming parallel file system inefficiencies. We also show how pNFS can improve the overall write performance of parallel file systems by using direct, parallel I/O for large write requests and a distributed file system for small write requests. We describe our pNFS prototype and present experiments demonstrating the performance improvements. ABSTRACTWorkload characterization studies highlight the prevalence of small and sequential data requests in scientific applications. Parallel file systems excel at large data transfers but sometimes at the expense of small I/O performance. pNFS is an NFSv4.1 highperformance enhancement that provides direct storage access to parallel file systems while preserving NFSv4 operating system and hardware platform independence. This paper demonstrates that distributed file systems can increase write throughput to parallel data stores-regardless of file size-by overcoming parallel file system inefficiencies. We also show how pNFS can improve the overall write performance of parallel file systems by using direct, parallel I/O for large write requests and a distributed file system for small write requests. We describe our pNFS prototype and present experiments demonstrating the performance improvements.
IBM Spectrum Scale’s parallel file system General Parallel File System (GPFS) has a 20-year development history with over 100 contributing developers. Its ability to support strict POSIX semantics across more than 10K clients leads to a complex design with intricate interactions between the cluster nodes. Tracing has proven to be a vital tool to understand the behavior and the anomalies of such a complex software product. However, the necessary trace information is often buried in hundreds of gigabytes of by-product trace records. Further, the overhead of tracing can significantly impact running applications and file system performance, limiting the use of tracing in a production system. In this research article, we discuss the evolution of the mature and highly scalable GPFS tracing tool and present the exploratory study of GPFS’ new tracing interface, FlexTrace , which allows developers and users to accurately specify what to trace for the problem they are trying to solve. We evaluate our methodology and prototype, demonstrating that the proposed approach has negligible overhead, even under intensive I/O workloads and with low-latency storage devices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.