Distributed, structured data stores such as Big Table, HBase, and Cassandra use a cluster of machines, each running a database-like software system called the Tablet Server Storage Layer or TSSL. A TSSL's performance on each node directly impacts the performance of the entire cluster. In this paper we introduce an efficient, scalable, multi-tier storage architecture for tablet servers. Our system can use any layered mix of storage devices such as Flash SSDs and magnetic disks. Our experiments show that by using a mix of technologies, performance for certain workloads can be improved beyond configurations using strictly two-tier approaches with one type of storage technology. We utilized, adapted, and integrated cache-oblivious algorithms and data structures, as well as Bloom filters, to improve scalability significantly. We also support versatile, efficient transactional semantics. We analyzed and evaluated our system against the storage layers of Cassandra and Hadoop HBase. We used wide range of workloads and configurations from read-to write-optimized, as well as different input sizes. We found that our system is 3-10× faster than existing systems; that using proper data structures, algorithms, and techniques is critical for scalability, especially on modern Flash SSDs; and that one can fully support versatile transactions without sacrificing performance.
Abstract-The modern file system is still implemented in the kernel, and is statically linked with other kernel components. This architecture has brought performance and efficient integration with memory management. However kernel development is slow and modern storage systems must support an array of features, including distribution across a network, tagging, searching, deduplication, checksumming, snap-shotting, file preallocation, real time I/O guarantees for media, and more. To move complex components into user-level however will require an efficient mechanism for handling page faulting and zero-copy caching, write ordering, synchronous flushes, interaction with the kernel page write-back thread, and secure shared memory. We implement such a system, and experiment with a user-level object store built on top. Our object store is a complete re-design of the traditional storage stack and demonstrates the efficiency of our technique, and the flexibility it grants to user-level storage systems. Our current prototype file system incurs between a 1% and 6% overhead on the default native file system EXT3 for in-cache system workloads. Where the native kernel file system design has traditionally found its primary motivation. For update and insert intensive metadata workloads that are out-of-cache, we perform 39 times better than the native EXT3 file system, while still performing only 2 times worse on out-of-cache random lookups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.