Qd-tree: Learning Data Layouts for Big Data Analytics

Yang, Zongheng; Chandramouli, Badrish; Wang, Chi; Gehrke, Johannes; Li, Yinan; Minhas, Umar Farooq; Larson, Per-Åke; Kossmann, Donald; Acharya, Rajeev

doi:10.1145/3318464.3389770

Cited by 71 publications

(39 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We decided not to evaluate against these because Flood already showed consistent superiority over them [30]. We also do not evaluate against other learned multi-dimensional indexes because they are either optimized for disk [25,46] or optimize only based on the data distribution, not the query workload [9,44] (see §7).…”

Section: Implementation and Setupmentioning

confidence: 99%

“…Learning has also been applied to the challenge of reducing I/O cost for disk-based multi-dimensional indexes. Qd-tree [46] uses reinforcement learning to construct a partitioning strategy that minimizes the number of disk-based blocks accessed by a query. LISA [25] is a disk-based learned spatial index that achieves low storage consumption and I/O cost while supporting range queries, nearest neighbor queries, and insertions and deletions.…”

Section: Related Workmentioning

confidence: 99%

“…To support dynamic data, each leaf node in the Grid Tree could maintain a sibling node that acts as a delta index [39] in which inserts, updates, and deletes are buffered and periodically merged into the main node. Persistence Tsunami's techniques for reducing query skew and handling correlations are not restricted to in-memory scenarios and could be incorporated into an index for data resident on disk or SSD, perhaps by combining ideas from qd-tree [46] or LISA [25].…”

Section: Future Workmentioning

confidence: 99%

“…To address the shortcomings of traditional indexes, recent work has proposed the idea of learned multi-dimensional indexes [9,25,30,44,46]. In particular, Flood [30] is a in-memory multi-dimensional index that automatically optimizes its structure to achieve high performance on a particular dataset and workload.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Tsunami

et al. 2020

View full text Add to dashboard Cite

Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes, in addition to up to 11X faster query performance and 170X smaller index size than optimally-tuned traditional indexes.

show abstract

Section: Implementation and Setupmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Future Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Tsunami

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Notably, DRL incorporates deep learning (DL) techniques to handle complex unstructured data and has been designed to learn from historical data and self-exploration to solve notoriously hard and large-scale problems (e.g., AlphaGo [15]). In recent years, researchers from different communities have proposed DRL solutions to address issues in data processing and analytics [4], [16], [17]. We categorize existing works using DRL from two perspectives: system and application.…”

Section: Introductionmentioning

confidence: 99%

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

Cai

Cui

Xiong

et al. 2022

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Data processing and analytics are fundamental and pervasive. Algorithms play a vital role in data processing and analytics where many algorithm designs have incorporated heuristics and general rules from human knowledge and experience to improve their effectiveness. Recently, reinforcement learning, deep reinforcement learning (DRL) in particular, is increasingly explored and exploited in many areas because it can learn better strategies in complicated environments it is interacting with than statically designed algorithms. Motivated by this trend, we provide a comprehensive review of recent works focusing on utilizing DRL to improve data processing and analytics. First, we present an introduction to key concepts, theories, and methods in DRL. Next, we discuss DRL deployment on database systems, facilitating data processing and analytics in various aspects, including data organization, scheduling, tuning, and indexing. Then, we survey the application of DRL in data processing and analytics, ranging from data preparation, natural language processing to healthcare, fintech, etc. Finally, we discuss important open challenges and future research directions of using DRL in data processing and analytics.

show abstract

DRL at the Application and Service Layer

2023

Deep Reinforcement Learning for Wireless Communications and Networking

View full text Add to dashboard Cite

Qd-tree: Learning Data Layouts for Big Data Analytics

Cited by 71 publications

References 38 publications

Tsunami

Tsunami

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

DRL at the Application and Service Layer

Contact Info

Product

Resources

About