LSM-based storage techniques: a survey

Luo, Chen; Carey, Michael J.

doi:10.1007/s00778-019-00555-y

Cited by 147 publications

(73 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 2. Indexes in the RUM space LSM-Trees [44], the Partitioned B-tree [77] and the Stepped Merge algorithm [78] optimize the performance of write. All updates and deletions in LSM Trees do not require searching for disk data and guarantee sequential writings by deferring and buffering all insert, modifying and deleting operations.…”

Section: Rum Conjecturementioning

confidence: 99%

Growth of relational model: Interdependence and complementary to big data

Shetty

Rao

Prabhu

2021

IJECE

View full text Add to dashboard Cite

A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system.

show abstract

Section: Rum Conjecturementioning

confidence: 99%

Growth of relational model: Interdependence and complementary to big data

Shetty

Rao

Prabhu

2021

IJECE

View full text Add to dashboard Cite

show abstract

“…For an LSM-tree with L levels, we assume that its first level (Level 0) is an in-memory buffer and the remaining levels (Level 1 to L − 1) are disk-resident. We adopt notation from the literature [21,50]. Buffering Inserts and Updates.…”

Section: Lsm Backgroundmentioning

confidence: 99%

Lethe: A Tunable Delete-Aware LSM Engine

Sarkar

Papon

Staratzis

et al. 2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Data-intensive applications fueled the evolution of log structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as a second-class citizen. A delete inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art LSM engines do not provide guarantees as to how fast a tombstone will propagate to persist the deletion. Further, LSM engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written. We highlight that fast persistent deletion without affecting read performance is key to support: (i) streaming systems operating on a window of data, (ii) privacy with latency guarantees on the right-to-be-forgotten, and (iii) en masse cloud deployment of data systems that makes storage a precious resource.To address these challenges, in this paper, we build a new key-value storage engine, Lethe, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order. We show that Lethe supports any user-defined threshold for the delete persistence latency offering higher read throughput (1.17 − 1.4×) and lower space amplification (2.1 − 9.8×), with a modest increase in write amplification (between 4% and 25%). In addition, Lethe supports efficient range deletes on a secondary delete key by dropping entire data pages without sacrificing read performance nor employing a costly full tree merge.

show abstract

“…LSM-tree-based KV stores: As stated in §1, the LSM-tree [35] is a major building block for most today's KV stores that target workloads with high volumes of inserts or updates. Many studies extend the LSM-tree design for improved compaction performance; we refer readers to the survey [29] on state-of-the-art LSMtree-based KV stores. To name a few, bLSM [42] proposes a new merge scheduler to prevent the compaction operations from blocking write requests, and uses Bloom filters for efficient indexing.…”

Section: Related Workmentioning

confidence: 99%

Enabling Efficient Updates in KV Storage via Hashing

Chan

Lee

2019

ACM Trans. Storage

View full text Add to dashboard Cite

Persistent key-value (KV) stores mostly build on the Log-Structured Merge (LSM) tree for high write performance, yet the LSM-tree suffers from the inherently high I/O amplification. KV separation mitigates I/O amplification by storing only keys in the LSM-tree and values in separate storage. However, the current KV separation design remains inefficient under update-intensive workloads due to its high garbage collection (GC) overhead in value storage. We propose HashKV, which aims for high update performance atop KV separation under update-intensive workloads. HashKV uses hash-based data grouping, which deterministically maps values to storage space so as to make both updates and GC efficient. We further relax the restriction of such deterministic mappings via simple but useful design extensions. We extensively evaluate various design aspects of HashKV. We show that HashKV achieves 4.6× update throughput and 53.4% less write traffic compared to the current KV separation design. In addition, we demonstrate that we can integrate the design of HashKV with state-of-the-art KV stores and improve their respective performance. IntroductionPersistent key-value (KV) stores are an integral part of modern large-scale storage infrastructures for storing massive structured data (e.g., [4,6,11,22]). While many real-world KV storage workloads are read-intensive (e.g., the Get-Update request ratio can reach 30× in Facebook's Memcached workloads [2]), update-intensive workloads are also dominant in many storage scenarios, including online transaction processing [47] and enterprise servers [21]. Field studies show that the amount of write requests becomes more significant in modern enterprise workloads. For example, Yahoo! reports that its low-latency workloads increasingly move from reads to writes [42]; Baidu reports that the read-write request ratio of a cloud storage workload is 2. 78× [22]; Microsoft reports that read-write traffic ratio of a 3-month OneDrive workload is 2.3× [7].Modern KV stores optimize the performance of writes (including inserts and updates) using the Log-Structured Merge (LSM) tree [35]. Its idea is to transform updates into sequential writes through a logstructured (append-only) design [40], while supporting efficient queries including individual key lookups and range scans. In a nutshell, the LSM-tree buffers written KV pairs and flushes them into a multi-level tree, in which each node is a fixed-size file containing sorted KV pairs and their metadata. It stores the recently written KV pairs at higher tree levels, and merges them with lower tree levels via compaction. The LSM-tree design not only improves write performance by avoiding small random updates (which are also harmful to the endurance of solid-state drives (SSDs) [1,33]), but also improves range scan performance by keeping sorted KV pairs in each node.

show abstract

LSM-based storage techniques: a survey

Cited by 147 publications

References 58 publications

Growth of relational model: Interdependence and complementary to big data

Growth of relational model: Interdependence and complementary to big data

Lethe: A Tunable Delete-Aware LSM Engine

Enabling Efficient Updates in KV Storage via Hashing

Contact Info

Product

Resources

About