The storage subsystem has undergone tremendous innovation in order to keep up with the ever-increasing demand for throughput. Non Volatile Memory Express (NVMe) based solid state devices are the latest development in this domain, delivering unprecedented performance in terms of latency and peak bandwidth. NVMe drives are expected to be particularly beneficial for I/O intensive applications, with databases being one of the prominent use-cases.This paper provides the first, in-depth performance analysis of NVMe drives. Combining driver instrumentation with system monitoring tools, we present a breakdown of access times for I/O requests throughout the entire system. Furthermore, we present a detailed, quantitative analysis of all the factors contributing to the low-latency, high-throughput characteristics of NVMe drives, including the system software stack. Lastly, we characterize the performance of multiple cloud databases (both relational and NoSQL) on stateof-the-art NVMe drives, and compare that to their performance on enterprise-class SATA-based SSDs. We show that NVMe-backed database applications deliver up to 8× superior client-side performance over enterprise-class, SATAbased SSDs.
In recent years, a number of benchmark suites have been created for the "Big Data" domain, and a number of such applications fit the client-server paradigm. A large volume of recent literature in characterizing "Big Data" applications have largely focused on two extremes of the characterization spectrum. On one hand, multiple studies have focused on client-side performance. These involve fine-tuning serverside parameters for an application to get the best client-side performance. On the other extreme, characterization focuses on picking one set of client-side parameters and then reporting the server microarchitectural statistics under those assumptions. While the two ends of the spectrum present interesting results, this paper argues that they are not enough, and in some cases, undesirable, to drive system-wide architectural decisions in datacenter design. This paper shows that for the purposes of designing an efficient datacenter, detailed microarchitectural characterization of "Big Data" applications is an overkill. It identifies four main system-level macro-architectural features and shows that these features are more representative of an application's system level behavior. To this end, a number of datacenter applications from a variety of benchmark suites are evaluated and classified into these previously identified macro-architectural features. Based on this analysis, the paper further shows that each application class will benefit from a very different server configuration leading to a highly efficient, cost-effective datacenter.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.