2019
DOI: 10.14778/3358701.3358707
|View full text |Cite
|
Sign up to set email alerts
|

Optimal column layout for hybrid workloads

Abstract: Data-intensive analytical applications need to support both efficient reads and writes. However, what is usually a good data layout for an update-heavy workload, is not well-suited for a read-mostly one and vice versa. Modern analytical data systems rely on columnar layouts and employ delta stores to inject new data and updates. We show that for hybrid workloads we can achieve close to one order of magnitude better performance by tailoring the column layout design to the data and query workload. Our approach n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 34 publications
(10 citation statements)
references
References 65 publications
0
10
0
Order By: Relevance
“…The one contains intricate transactions and queries, such as CH-benCHmark [41], CBTR [26], and HTAPBench [42]. The other includes a mix of simple insert/select operations, i.e., ADAPT [43] and HAP [44]. The real-time queries generally involve simple aggregate operations and the analytical queries include more complex operations.…”
Section: Related Workmentioning
confidence: 99%
“…The one contains intricate transactions and queries, such as CH-benCHmark [41], CBTR [26], and HTAPBench [42]. The other includes a mix of simple insert/select operations, i.e., ADAPT [43] and HAP [44]. The real-time queries generally involve simple aggregate operations and the analytical queries include more complex operations.…”
Section: Related Workmentioning
confidence: 99%
“…Workload: We generate workloads using the benchmark proposed by previous work [11,12]. The benchmark consists of the following queries that are common in HTAP workloads: (𝑄 1 ) inserts new tuples, (𝑄 2 ) is a point query that selects a specific row, (𝑄 3 ) is an aggregate query that computes the maximum values of selected attributes over selected tuples, (𝑄 4 ) is an arithmetic query that sums a subset of attributes over the selected tuples, and (𝑄 5 ) is an update query that updates a subset of attributes of a specific row.…”
Section: Evaluation Of Lasermentioning
confidence: 99%
“…Along with the inserts, we issue 100 updates per second, i.e., one percent of the insert rate, via 𝑄 5 , where a randomly chosen column value is updated for a recently inserted key. This update pattern mimics updates and corrections frequently taking place in mixed analytical and transactional processing [12]. Furthermore, we control the access patterns throughout the data lifecycle by selecting 𝑘, 𝑣, 𝑣 𝑠 , and 𝑣 𝑒 for queries 𝑄 2 − 𝑄 4 such that the upper levels of the LSM-Tree are mostly accessed by point read operations and wider projections, whereas lower levels are accessed by scan operations and narrower projections.…”
Section: Performance Of Lasermentioning
confidence: 99%
“…These tools can be broadly classified as offline workload analysis for index and views design [2,3,22,26,84,93], and periodic online workload analysis [18,[75][76][77] to capture workload drift [43]. In addition, there has been research on reducing the magnitude of the search space of tuning [17,27] and on deciding the optional data partitioning [9,65,79,81,82]. These approaches assume that the input information about resources and workload is accurate.…”
Section: Robustness Is All You Needmentioning
confidence: 99%