This paper presents a novel Network Request Scheduler (NRS) for a large-scale, Lustre TM storage system. It proposes a quantum-based, Object Based Round Robin (OBRR) NRS algorithm that reorders the execution of I/O requests per data object, presenting a workload to backend storage that can be optimized more easily. According to the drawback of static deadlines in large-scale workloads, it proposes a novel, two-level deadline setting strategy that not only avoids starvation, but also guarantees that urgent I/O requests are serviced in a specified time period. Via a series of simulation experiments using a Lustre simulator, it demonstrates that I/O performance increases as much as 40% when using the OBRR NRS algorithm, and the two-level deadline setting strategy can avoid starvation and ensure that urgent I/O requests are serviced in the required time.
In high-performance computing (HPC), data and metadata are stored on special server nodes and client applications access the servers’ data and metadata through a network, which induces network latencies and resource contention. These server nodes are typically equipped with (slow) magnetic disks, while the client nodes store temporary data on fast SSDs or even on non-volatile main memory (NVMM). Therefore, the full potential of parallel file systems can only be reached if fast client side storage devices are included into the overall storage architecture.
In this article, we propose an NVMM-based hierarchical persistent client cache for the Lustre file system (NVMM-LPCC for short). NVMM-LPCC implements two caching modes: a read and write mode (RW-NVMM-LPCC for short) and a read only mode (RO-NVMM-LPCC for short). NVMM-LPCC integrates with the Lustre Hierarchical Storage Management (HSM) solution and the Lustre layout lock mechanism to provide consistent persistent caching services for I/O applications running on client nodes, meanwhile maintaining a global unified namespace of the entire Lustre file system. The evaluation results presented in this article show that NVMM-LPCC can increase the average read throughput by up to 35.80 times and the average write throughput by up to 9.83 times compared with the native Lustre system, while providing excellent scalability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.