Optimal column layout for hybrid workloads

Athanassoulis, Manos; Bøgh, Kenneth S.; Idreos, Stratos

doi:10.14778/3358701.3358707

Cited by 34 publications

(10 citation statements)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The one contains intricate transactions and queries, such as CH-benCHmark [41], CBTR [26], and HTAPBench [42]. The other includes a mix of simple insert/select operations, i.e., ADAPT [43] and HAP [44]. The real-time queries generally involve simple aggregate operations and the analytical queries include more complex operations.…”

Section: Related Workmentioning

confidence: 99%

OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems

Kang,

Wang,

Gao

et al. 2022

Preprint

View full text Add to dashboard Cite

As real-time analysis on the fresh data become increasingly compelling, more organizations deploy Hybrid Transactional/Analytical Processing (HTAP) systems to support realtime queries on data recently generated by online transaction processing. This paper argues that real-time queries, semantically consistent schema, and domain-specific workloads are essential in benchmarking, designing, and implementing HTAP systems. However, most state-of-the-art and state-of-the-practice benchmarks ignore those critical factors. Hence, at best, they are incommensurable and, at worst, misleading in benchmarking, designing, and implementing HTAP systems. This paper presents OLxPBench, a composite HTAP benchmark suite. OLxPBench proposes: (1) the abstraction of a hybrid transaction, performing a real-time query in-between an online transaction, to model widely-observed behavior pattern -making a quick decision while consulting real-time analysis; (2) a semantically consistent schema to express the relationships between OLTP and OLAP schema; (3) the combination of domain-specific and general benchmarks to characterize diverse application scenarios with varying resource demands. Our evaluations justify the three design decisions of OLxPBench and pinpoint the bottlenecks of two mainstream distributed HTAP DBMSs. International Open Benchmark Council (BenchCouncil) sets up the OLxP-Bench homepage at https://www.benchcouncil.org/olxpbench/. Its source code is available from https://github.com/BenchCouncil/ olxpbench.git.

show abstract

Section: Related Workmentioning

confidence: 99%

OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems

Kang,

Wang,

Gao

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Workload: We generate workloads using the benchmark proposed by previous work [11,12]. The benchmark consists of the following queries that are common in HTAP workloads: (𝑄 1 ) inserts new tuples, (𝑄 2 ) is a point query that selects a specific row, (𝑄 3 ) is an aggregate query that computes the maximum values of selected attributes over selected tuples, (𝑄 4 ) is an arithmetic query that sums a subset of attributes over the selected tuples, and (𝑄 5 ) is an update query that updates a subset of attributes of a specific row.…”

Section: Evaluation Of Lasermentioning

confidence: 99%

“…Along with the inserts, we issue 100 updates per second, i.e., one percent of the insert rate, via 𝑄 5 , where a randomly chosen column value is updated for a recently inserted key. This update pattern mimics updates and corrections frequently taking place in mixed analytical and transactional processing [12]. Furthermore, we control the access patterns throughout the data lifecycle by selecting 𝑘, 𝑣, 𝑣 𝑠 , and 𝑣 𝑒 for queries 𝑄 2 − 𝑄 4 such that the upper levels of the LSM-Tree are mostly accessed by point read operations and wider projections, whereas lower levels are accessed by scan operations and narrower projections.…”

Section: Performance Of Lasermentioning

confidence: 99%

Real-Time LSM-Trees for HTAP Workloads

Saxena¹,

Golab²,

Idreos³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Real-time data analytics systems such as SAP HANA, MemSQL, and IBM Wildfire employ hybrid data layouts, in which data are stored in different formats throughout their lifecycle. Recent data are stored in a row-oriented format to serve OLTP workloads and support high data rates, while older data are transformed to a column-oriented format for OLAP access patterns. We observe that a Log-Structured Merge (LSM) Tree is a natural fit for a lifecycleaware storage engine due to its high write throughput and leveloriented structure, in which records propagate from one level to the next over time. To build a lifecycle-aware storage engine using an LSM-Tree, we make a crucial modification to allow different data layouts in different levels, ranging from purely row-oriented to purely column-oriented, leading to a Real-Time LSM-Tree. We give a cost model and an algorithm to design a Real-Time LSM-Tree that is suitable for a given workload, followed by an experimental evaluation of LASER -a prototype implementation of our idea built on top of the RocksDB key-value store. In our evaluation, LASER is almost 5x faster than Postgres (a pure row-store) and two orders of magnitude faster than MonetDB (a pure column-store) for real-time data analytics workloads.

show abstract

“…These tools can be broadly classified as offline workload analysis for index and views design [2,3,22,26,84,93], and periodic online workload analysis [18,[75][76][77] to capture workload drift [43]. In addition, there has been research on reducing the magnitude of the search space of tuning [17,27] and on deciding the optional data partitioning [9,65,79,81,82]. These approaches assume that the input information about resources and workload is accurate.…”

Section: Robustness Is All You Needmentioning

confidence: 99%

Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty

Huynh¹,

Chaudhari²,

Terzi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Log-Structured Merge trees (LSM trees) are increasingly used as the storage engines behind several data systems, frequently deployed in the cloud. Similar to other database architectures, LSM trees take into account information about the expected workload (e.g., reads vs. writes, point vs. range queries) to optimize their performance via tuning. Operating in shared infrastructure like the cloud, however, comes with a degree of workload uncertainty due to multi-tenancy and the fast-evolving nature of modern applications. Systems with static tuning discount the variability of such hybrid workloads and hence provide an inconsistent and overall suboptimal performance.To address this problem, we introduce Endure -a new paradigm for tuning LSM trees in the presence of workload uncertainty. Specifically, we focus on the impact of the choice of compaction policies, size-ratio, and memory allocation on the overall performance. Endure considers a robust formulation of the throughput maximization problem, and recommends a tuning that maximizes the worst-case throughput over a neighborhood of each expected workload. Additionally, an uncertainty tuning parameter controls the size of this neighborhood, thereby allowing the output tunings to be conservative or optimistic. Through both model-based and extensive experimental evaluation of Endure in the state-ofthe-art LSM-based storage engine, RocksDB, we show that the robust tuning methodology consistently outperforms classical tuning strategies. We benchmark Endure using 15 workload templates that generate more than 10000 unique noisy workloads. The robust tunings output by Endure lead up to a 5× improvement in throughput in presence of uncertainty. On the flip side, when the observed workload exactly matches the expected one, Endure tunings have negligible performance loss.

show abstract

Optimal column layout for hybrid workloads

Cited by 34 publications

References 65 publications

OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems

OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems

Real-Time LSM-Trees for HTAP Workloads

Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty

Contact Info

Product

Resources

About