Fast storage devices are an emerging solution to satisfy data-intensive applications. They provide high transaction rates for DBMS, low response times for Web servers, instant on-demand paging for applications with large memory footprints, and many similar advantages for performance-hungry applications. In spite of the benefits promised by fast hardware, modern operating systems are not yet structured to take advantage of the hardware's full potential. The software overhead caused by an OS, negligible in the past, adversely impacts application performance, lessening the advantage of using such hardware. Our analysis demonstrates that the overheads from the traditional storage-stack design are significant and cannot easily be overcome without modifying the hardware interface and adding new capabilities to the operating system.In this article, we propose six optimizations that enable an OS to fully exploit the performance characteristics of fast storage devices. With the support of new hardware interfaces, our optimizations minimize per-request latency by streamlining the I/O path and amortize per-request latency by maximizing parallelism inside the device. We demonstrate the impact on application performance through well-known storage benchmarks run against a Linux kernel with a customized SSD. We find that eliminating context switches in the I/O path decreases the software overhead of an I/O request from 20 microseconds to 5 microseconds and a new request merge scheme called Temporal Merge enables the OS to achieve 87% to 100% of peak device performance, regardless of request access patterns or types. Although the performance improvement by these optimizations on a standard SATA-based SSD is marginal (because of its limited interface and relatively high response times), our sensitivity analysis suggests that future SSDs with lower response times will benefit from these changes. The effectiveness of our optimizations encourages discussion between the OS community and storage vendors about future device interfaces for fast storage devices. Portions of this work appeared in the 4th USENIX Workshop on Hot Topics in Storage and File Systems [Yu et al. 2012] and in Y. J. Yu's Ph.D. dissertation [Yu 2012].
Native Command Queueing (NCQ) is an optimization technology to maximize throughput by reordering requests inside a disk drive. It has been so successful that NCQ has become the standard in SATA 2 protocol specification, and the great majority of disk vendors have adopted it for their recent disks. However, there is a possibility that the technology may lead to an information gap between the OS and a disk drive. A NCQ-enabled disk tries to optimize throughput without realizing the intention of an OS, whereas the OS does its best under the assumption that the disk will do as it is told without specific knowledge regarding the details of the disk mechanism. Let us call this expectation discord , which may cause serious problems such as request starvations or performance anomaly. In this article, we (1) confirm that expectation discord actually occurs in real systems; (2) propose software-level approaches to solve them; and (3) evaluate our mechanism. Experimental results show that our solution is simple, cheap (no special hardware required), portable, and effective.
Storage area network (SAN) is one of the most popular solutions for constructing server environments these days. In these kinds of server environments, HDD-based storage usually becomes the bottleneck of the overall system, but it is not enough to merely replace the devices with faster ones in order to exploit their high performance. In other words, proper optimizations are needed to fully utilize their performance gains. In this work, we first adopted a DRAM-based SSD as a fast backend-storage in the existing SAN environment, and found significant performance degradation compared to its own capabilities, especially in the case of small-sized random I/O pattern, even though a high-speed network was used. We have proposed three optimizations to solve this problem: (1) removing software overhead in the SAN I/O path; (2) increasing parallelism in the procedures for handling I/O requests; and (3) adopting the temporal merge mechanism to reduce network overheads. We have implemented them as a prototype and found that our approaches make substantial performance improvements by up to 39% and 280% in terms of both the latency and bandwidth, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.