Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data 2020
DOI: 10.1145/3318464.3389770
|View full text |Cite
|
Sign up to set email alerts
|

Qd-tree: Learning Data Layouts for Big Data Analytics

Abstract: Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today's systems usually partition data by arrival time into row groups, or range/hash partition the data based on selected… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 71 publications
(39 citation statements)
references
References 38 publications
0
39
0
Order By: Relevance
“…We decided not to evaluate against these because Flood already showed consistent superiority over them [30]. We also do not evaluate against other learned multi-dimensional indexes because they are either optimized for disk [25,46] or optimize only based on the data distribution, not the query workload [9,44] (see §7).…”
Section: Implementation and Setupmentioning
confidence: 99%
See 3 more Smart Citations
“…We decided not to evaluate against these because Flood already showed consistent superiority over them [30]. We also do not evaluate against other learned multi-dimensional indexes because they are either optimized for disk [25,46] or optimize only based on the data distribution, not the query workload [9,44] (see §7).…”
Section: Implementation and Setupmentioning
confidence: 99%
“…Learning has also been applied to the challenge of reducing I/O cost for disk-based multi-dimensional indexes. Qd-tree [46] uses reinforcement learning to construct a partitioning strategy that minimizes the number of disk-based blocks accessed by a query. LISA [25] is a disk-based learned spatial index that achieves low storage consumption and I/O cost while supporting range queries, nearest neighbor queries, and insertions and deletions.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Notably, DRL incorporates deep learning (DL) techniques to handle complex unstructured data and has been designed to learn from historical data and self-exploration to solve notoriously hard and large-scale problems (e.g., AlphaGo [15]). In recent years, researchers from different communities have proposed DRL solutions to address issues in data processing and analytics [4], [16], [17]. We categorize existing works using DRL from two perspectives: system and application.…”
Section: Introductionmentioning
confidence: 99%