Persistent key-value (KV) stores mostly build on the Log-Structured Merge (LSM) tree for high write performance, yet the LSM-tree suffers from the inherently high I/O amplification. KV separation mitigates I/O amplification by storing only keys in the LSM-tree and values in separate storage. However, the current KV separation design remains inefficient under update-intensive workloads due to its high garbage collection (GC) overhead in value storage. We propose HashKV, which aims for high update performance atop KV separation under update-intensive workloads. HashKV uses hash-based data grouping, which deterministically maps values to storage space so as to make both updates and GC efficient. We further relax the restriction of such deterministic mappings via simple but useful design extensions. We extensively evaluate various design aspects of HashKV. We show that HashKV achieves 4.6× update throughput and 53.4% less write traffic compared to the current KV separation design. In addition, we demonstrate that we can integrate the design of HashKV with state-of-the-art KV stores and improve their respective performance.
IntroductionPersistent key-value (KV) stores are an integral part of modern large-scale storage infrastructures for storing massive structured data (e.g., [4,6,11,22]). While many real-world KV storage workloads are read-intensive (e.g., the Get-Update request ratio can reach 30× in Facebook's Memcached workloads [2]), update-intensive workloads are also dominant in many storage scenarios, including online transaction processing [47] and enterprise servers [21]. Field studies show that the amount of write requests becomes more significant in modern enterprise workloads. For example, Yahoo! reports that its low-latency workloads increasingly move from reads to writes [42]; Baidu reports that the read-write request ratio of a cloud storage workload is 2. 78× [22]; Microsoft reports that read-write traffic ratio of a 3-month OneDrive workload is 2.3× [7].Modern KV stores optimize the performance of writes (including inserts and updates) using the Log-Structured Merge (LSM) tree [35]. Its idea is to transform updates into sequential writes through a logstructured (append-only) design [40], while supporting efficient queries including individual key lookups and range scans. In a nutshell, the LSM-tree buffers written KV pairs and flushes them into a multi-level tree, in which each node is a fixed-size file containing sorted KV pairs and their metadata. It stores the recently written KV pairs at higher tree levels, and merges them with lower tree levels via compaction. The LSM-tree design not only improves write performance by avoiding small random updates (which are also harmful to the endurance of solid-state drives (SSDs) [1,33]), but also improves range scan performance by keeping sorted KV pairs in each node.