HyperDex

Escriva, Robert; Wong, Bernard; Sirer, Emin Gün

doi:10.1145/2342356.2342360

Cited by 102 publications

(5 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LSM-tree data stores: Today's monolithic LSM-tree systems [17,22,26,37,52,63] require a few memtables to saturate the bandwidth of one disk. Nova-LSM uses a large number of memtables to saturate the disk bandwidth of multiple StoCs.…”

Section: Related Workmentioning

confidence: 99%

Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store

Huang,

Ghandeharizadeh

2021

Preprint

View full text Add to dashboard Cite

The cloud infrastructure motivates disaggregation of monolithic data stores into components that are assembled together based on an application's workload. This study investigates disaggregation of an LSM-tree key-value store into components that communicate using RDMA. These components separate storage from processing, enabling processing components to share storage bandwidth and space. The processing components scatter blocks of a file (SSTable) across an arbitrary number of storage components and balance load across them using power-of-d. They construct ranges dynamically at runtime to parallelize compaction and enhance performance. Each component has configuration knobs that control its scalability. The resulting component-based system, Nova-LSM, is elastic. It outperforms its monolithic counterparts, both LevelDB and RocksDB, by several orders of magnitude with workloads that exhibit a skewed pattern of access to data.

show abstract

Section: Related Workmentioning

confidence: 99%

Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store

Huang,

Ghandeharizadeh

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Therefore, certain queries, such as range queries and prefix matching, cannot be supported efficiently. In practice, storage systems like HyperDex [40] and HBase [45] reorganize data once written, to create an ordered storage layout. Parallel sorting algorithms like SDS-Sort [38] also re-create order by making multiple passes over data once written.…”

Section: :23mentioning

confidence: 99%

Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories

et al. 2020

View full text Add to dashboard Cite

Complex storage stacks providing data compression, indexing, and analytics help leverage the massive amounts of data generated today to derive insights. It is challenging to perform this computation, however, while fully utilizing the underlying storage media. This is because, while storage servers with large core counts are widely available, single-core performance and memory bandwidth per core grow slower than the core count per die. Computational storage offers a promising solution to this problem by utilizing dedicated compute resources along the storage processing path. We present DeltaFS Indexed Massive Directories (IMDs), a new approach to computational storage. DeltaFS IMDs harvest available (i.e., not dedicated) compute, memory, and network resources on the compute nodes of an application to perform computation on data. We demonstrate the efficiency of DeltaFS IMDs by using them to dynamically reorganize the output of a real-world simulation application across 131,072 CPU cores. DeltaFS IMDs speed up reads by 1,740× while only slightly slowing down the writing of data during simulation I/O for in situ data processing.

show abstract

“…We also compared RAMCloud with HyperDex [Escriva et al 2012] and Redis [2014], which are high-performance in-memory key-value stores. Redis keeps all of its data in DRAM and uses logging for durability, like RAMCloud.…”

Section: Redis and Hyperdexmentioning

confidence: 99%

“…However, it offers only weak durability guarantees: the local log is written with a 1-second fsync interval, and updates to replicas are batched and sent in the background (Redis also offers a synchronous update mode, but this degrades performance significantly). HyperDex [Escriva et al 2012] offers similar durability and consistency to RAMCloud, and it supports a richer data model, including range scans and efficient searches across multiple columns. However, it is a disk-based system.…”

Section: Redis and Hyperdexmentioning

confidence: 99%

The RAMCloud Storage System

Ousterhout

Gopalan

Gupta

et al. 2015

ACM Trans. Comput. Syst.

206

136

View full text Add to dashboard Cite

RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1PB or more), it aggregates the memories of thousands of servers into a single coherent key-value store. RAMCloud ensures the durability of DRAM-based data by keeping backup copies on secondary storage. It uses a uniform logstructured mechanism to manage both DRAM and secondary storage, which results in high performance and efficient memory usage. RAMCloud uses a polling-based approach to communication, bypassing the kernel to communicate directly with NICs; with this approach, client applications can read small objects from any RAMCloud storage server in less than 5μs, durable writes of small objects take about 13.5μs. RAMCloud does not keep multiple copies of data online; instead, it provides high availability by recovering from crashes very quickly (1 to 2 seconds). RAMCloud's crash recovery mechanism harnesses the resources of the entire cluster working concurrently so that recovery performance scales with cluster size. 7:2 J. Ousterhout et al.[Ritchie and Thompson 1974]. Over the past 15 years, the use of DRAM in storage systems has accelerated, driven by the needs of large-scale Web applications. These applications manipulate very large datasets with an intensity that cannot be satisfied by disk and flash alone. As a result, applications are keeping more and more of their long-term data in DRAM. By 2005, all of the major Web search engines kept their search indexes entirely in DRAM, and large-scale caching systems such as memcached [Memcached 2011] have become widely used for applications such as Facebook, Twitter, Wikipedia, and YouTube.Although DRAM's role is increasing, it is still difficult for application developers to capture the full performance potential of DRAM-based storage. In many cases, DRAM is used as a cache for some other storage system, such as a database; this approach forces developers to manage consistency between the cache and the backing store, and its performance is limited by cache misses and backing store overheads. In other cases, DRAM is managed in an application-specific fashion, which provides high performance but at a high complexity cost for developers. A few recent systems such as Redis [2014] and Cassandra [2014] have begun to provide general-purpose facilities for accessing data in DRAM, but their performance does not approach the full potential of DRAMbased storage.This article describes RAMCloud, a general-purpose distributed storage system that keeps all data in DRAM at all times. RAMCloud combines three overall attributes: low latency, large scale, and durability. When used with state-of-the-art networking, RAM-Cloud offers exceptionally low latency for remote access. In our 80-node development cluster with QDR Infiniband, a client can read any 100-byte object in less than 5μs, and durable writes take about 13.5μs. In a large datacenter with 100,000 nodes, we expect small reads to compl...

show abstract

HyperDex

Cited by 102 publications

References 43 publications

Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store

Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store

Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories

The RAMCloud Storage System

Contact Info

Product

Resources

About