K2: Work-Constraining Scheduling of NVMe-Attached Storage

Miemietz, Till; Weisbach, Hannes; Roitzsch, Michael; Härtig, Hermann

doi:10.1109/rtss46320.2019.00016

Cited by 6 publications

(12 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Linux Kernel, there exist several I/O schedulers for an SSD, which are developed for improving average performance of general-purpose tasks. Now, we explain the following I/O schedulers: the none [20], BFQ (Budget Fair Queuing) [22], MQ-Deadline (Multi-Queue Deadline) [23], and Kyber [24] schedulers.…”

Section: I/o Schedulers For General-purpose Tasksmentioning

confidence: 99%

“…The none scheduler [20], as the name indicates, implies there is no specific I/O scheduler that regulates the mechanism of block layers. That is, the none scheduler passes I/O requests to the NVMe queue in the FIFO (First-In-First-Out) manner.…”

Section: I/o Schedulers For General-purpose Tasksmentioning

confidence: 99%

“…However, modifying the FTL is usually not applicable to commercial SSDs, because the mechanisms of the FTL developed by the manufacturers of SSDs themselves or those of their controllers are usually neither (i) open to the public nor (ii) modifiable. On the other hand, the K2 scheduler [20] supports timing guarantees of a real-time task, by modifying the I/O scheduler in Linux Kernel, which satisfies both (i) and (ii).…”

Section: K2 Schedulermentioning

confidence: 99%

“…To analyze the performance of the K2 scheduler, we use the source kernel and the scheduler provided by the K2 scheduler paper [20]. The target operating system is Ubuntu 18.04, but the source kernel includes LTTng, an open source program that allows to trace I/O latency in Linux Kernel.…”

Section: Experiments Settings For Partially Occupied Ssdmentioning

confidence: 99%

“…A recent study has focused on the I/O scheduler in Linux Kernel and developed a work-constraining I/O scheduler, called K2 [20]. Compared to modifying the FTL, modifying the I/O scheduler in Linux Kernel is more effective in terms of ease of implementation and wide applicability to most (if not all) SSDs.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Analysis of the K2 Scheduler for a Real-Time System with an SSD

Park

Lee

2021

Electronics

View full text Add to dashboard Cite

While an SSD (Solid State Drive) has been widely used for storage in many computing systems due to its small average latency, how to provide timing guarantees of a delay-sensitive (real-time) task on a real-time system equipped with an SSD has not been fully explored. A recent study has proposed a work-constraining I/O scheduler, called K2, which has succeeded in reducing the tail latency of a real-time task at the expense of compromising the total bandwidth for real-time and non-real-time tasks. Although the queue length bound parameter of the K2 scheduler is a key to regulate the tradeoff between a decrease in the tail latency of a real-time task and an increase in penalty of the total bandwidth, the parameter’s impact on the tradeoff has not been thoroughly investigated. In particular, no studies have addressed how the case of a fully occupied SSD that incurs garbage collection changes the performance of the K2 scheduler in terms of the tail latency of the real-time task and the total bandwidth. In this paper, we systematically analyze the performance of the K2 scheduler for different I/O operation types, based on experiments on Linux. We investigate how the performance is changed on a fully occupied SSD due to garbage collection. Utilizing the investigation, we draw general guidelines on how to select a proper setting of the queue length bound for better performance. Finally, we propose how to apply the guidelines to achieve target objectives that optimize the tail latency of the real-time task and the total bandwidth at the same time, which has not been achieved by previous studies.

show abstract

Section: I/o Schedulers For General-purpose Tasksmentioning

confidence: 99%

Section: I/o Schedulers For General-purpose Tasksmentioning

confidence: 99%

Section: K2 Schedulermentioning

confidence: 99%

Section: Experiments Settings For Partially Occupied Ssdmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Analysis of the K2 Scheduler for a Real-Time System with an SSD

Park

Lee

2021

Electronics

View full text Add to dashboard Cite

show abstract

DirectNVM: Hardware-accelerated NVMe SSDs for High-performance Embedded Computing

Zou

Awad

Lin

2022

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

With data-intensive artificial intelligence (AI) and machine learning (ML) applications rapidly surging, modern high-performance embedded systems, with heterogeneous computing resources, critically demand low-latency and high-bandwidth data communication. As such, the newly emerging NVMe (Non-Volatile Memory Express) protocol, with parallel queuing, access prioritization, and optimized I/O arbitration, starts to be widely adopted as a de facto fast I/O communication interface. However, effectively leveraging the potential of modern NVMe storage proves to be nontrivial and demands fine-grained control, high processing concurrency, and application-specific optimization. Fortunately, modern FPGA devices, capable of efficient parallel processing and application-specific programmability, readily meet the underlying physical layer requirements of the NVMe protocol, therefore providing unprecedented opportunities to implementing a rich-featured NVMe middleware to benefit modern high-performance embedded computing. In this article, we present how to rethink existing accessing mechanisms of NVMe storage and devise innovative hardware-assisted solutions to accelerating NVMe data access performance for the high-performance embedded computing system. Our key idea is to exploit the massively parallel I/O queuing capability, provided by the NVMe storage system, through leveraging FPGAs’ reconfigurability and native hardware computing power to operate transparently to the main processor. Specifically, our DirectNVM system aims at providing effective hardware constructs for facilitating high-performance and scalable userspace storage applications through (1) hardening all the essential NVMe driver functionalities, therefore avoiding expensive OS syscalls and enabling zero-copy data access from the application, (2) relying on hardware for the I/O communication control instead of relying on OS-level interrupts that can significantly reduce both total I/O latency and its variance, and (3) exposing cutting-edge and application-specific weighted-round-robin I/O traffic scheduling to the userspace. To validate our design methodology, we developed a complete DirectNVM system utilizing the Xilinx Zynq MPSoC architecture that incorporates a high-performance application processor (APU) equipped with DDR4 system memory and a hardened configurable PCIe Gen3 block in its programmable logic part. We then measured the storage bandwidth and I/O latency of both our DirectNVM system and a conventional OS-based system when executing the standard FIO benchmark suite [ 2 ]. Specifically, compared against the PetaLinux built-in kernel driver code running on a Zynq MPSoC, our DirectNVM has shown to achieve up to 18.4× higher throughput and up to 4.5× lower latency. To ensure the fairness of our performance comparison, we also measured our DirectNVM system against the Intel SPDK [ 26 ], a highly optimized userspace asynchronous NVMe I/O framework running on a X86 PC system. Our experiment results have shown that our DirectNVM, even running on a considerably less powerful embedded ARM processor than a full-scale AMD processor, achieved up to 2.2× higher throughput and 1.3× lower latency. Furthermore, by experimenting with a multi-threading test case, we have demonstrated that our DirectNVM’s weighted-round-robin scheduling can significantly optimize the bandwidth allocation between latency-constraint frontend applications and other backend applications in real-time systems. Finally, we have developed a theoretical framework of performance modeling with classic queuing theory that can quantitatively define the relationship between a system’s I/O performance and its I/O implementation.

show abstract

Towards low-latency I/O services for mixed workloads using ultra-low latency SSDs

Liu

et al. 2022

Proceedings of the 36th ACM International Conference on Supercomputing

View full text Add to dashboard Cite

Low-latency I/O services are essential for latency-sensitive workloads when they co-run with throughput-oriented workloads in cloud data centers. Although advanced SSDs such as Intel Optane SSDs can offer ultra-low latency at the device layer, I/O interference among various workloads through the I/O stack can still significantly enlarge I/O latency. It is still an open problem to best utilize ultra-low latency SSDs in cloud computing environments.In this paper, we analyze the entire I/O stack and reveal that I/O interference is mainly attributed to resource contention in the SSD device, transactions commit in the file system, and costly process scheduling. To address these problems, we propose FastResponse, a holistic approach to use ultra-low latency SSDs for latency-sensitive workloads. First, we propose a new I/O scheduler at the block layer to throttle I/O requests of throughput-oriented workloads, and thus reduce the resource contention in the SSD device. Second, we develop a fine-grained journaling scheme to reduce the latency of transaction at the file system layer. Third, we redesign Completely Fair Scheduler (CFS) to promote the priority of latency-sensitive processes. We implement FastResponse in Linux kernel and evaluate it with several mixed workloads. Compared with the vanilla Linux and the state-of-the-art SelectISR, FastResponse can reduce the average response time of latency-sensitive workloads by 18-70% and 10-67%, respectively, and reduce the 99.9th percentile response time by 58-80% and 52-78%, respectively. Meanwhile, the performance degradation for throughput-oriented workloads is less than 6%. CCS CONCEPTS• General and reference → Performance; • Software and its engineering → Operating systems.

show abstract

K2: Work-Constraining Scheduling of NVMe-Attached Storage

Cited by 6 publications

References 14 publications

Analysis of the K2 Scheduler for a Real-Time System with an SSD

Analysis of the K2 Scheduler for a Real-Time System with an SSD

DirectNVM: Hardware-accelerated NVMe SSDs for High-performance Embedded Computing

Towards low-latency I/O services for mixed workloads using ultra-low latency SSDs

Contact Info

Product

Resources

About