Estimating join selectivities using bandwidth-optimized kernel density models

Kiefer, Martin; Heimel, Max; Breß, Sebastian; Markl, Volker

doi:10.14778/3151106.3151112

Cited by 58 publications

(55 citation statements)

References 33 publications

(62 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To support the shifts in workload and dataset, they update the bandwidth after each incoming query and design the new sample maintenance method for insert-only workload and updates/deletions workload. Furthermore, in Kiefer et al [48] extend the method into estimating the selectivity of join. They design two different models: single model over the join samples and the models over the base tables, which does not need the join operation and estimates the selectivity of join with the independent assumption.…”

Section: Unsupervised Methodsmentioning

confidence: 99%

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

2021

View full text Add to dashboard Cite

Query optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.

show abstract

Section: Unsupervised Methodsmentioning

confidence: 99%

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

2021

View full text Add to dashboard Cite

show abstract

“…Therefore, it would be a natural idea of combining datadriven and query-driven models. As discussed before, the existing proposals leveraging both data and query workload [19,30,37,39] are insufficient towards this direction. An idea to overcome the problem of data-driven methods suffering the tail of the distribution due to their averaging optimization target would be using ensemble methods with each component targeting a different part of the distribution.…”

Section: Overviewmentioning

confidence: 99%

“…In fact, a few proposals (e.g., DeepDB) consider the combination as an interesting avenue for future work. Moreover, towards this direction several solutions [19,30,37,39] have been proposed to utilize both data and workload.…”

Section: Introductionmentioning

confidence: 99%

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

Wu¹,

Cong

2021

Proceedings of the 2021 International Conference on Management of Data

View full text Add to dashboard Cite

Cardinality estimation is a fundamental problem in database systems. To capture the rich joint data distributions of a relational table, most of the existing work either uses data as unsupervised information or uses query workload as supervised information. Very little work has been done to use both types of information, and cannot fully make use of both types of information to learn the joint data distribution. In this work, we aim to close the gap between data-driven and query-driven methods by proposing a new unified deep autoregressive model, UAE, that learns the joint data distribution from both the data and query workload. First, to enable using the supervised query information in the deep autoregressive model, we develop differentiable progressive sampling using the Gumbel-Softmax trick. Second, UAE is able to utilize both types of information to learn the joint data distribution in a single model. Comprehensive experimental results demonstrate that UAE achieves single-digit multiplicative error at tail, better accuracies over state-of-the-art methods, and is both space and time efficient.

show abstract

“…The problem of join selectivity estimation has been extensively studied in the relational database [1,7,15,16,18,33,34]. Particularly, these studies can be divided into three classes.…”

Section: Introductionmentioning

confidence: 99%

Selectivity Estimation for Relation-Tree Joins

Zhang

2020

32nd International Conference on Scientific and Statistical Database Management

View full text Add to dashboard Cite

Estimating the join selectivity is a crucial problem in many aspects of query processing, such as query optimization and query refinement. Selectivity estimation has been extensively studied for the relational joins in SQL queries and structural joins in pathoriented queries. However, as leading databases have supported the multi-model data management on relational and tree-structured data together, a new problem has arisen: the existing estimation techniques mainly work for a single model but not for the heterogeneous situation due to the cross-model joins. A straightforward combination of existing estimators cannot provide a satisfactory estimation quality.This paper studies the problem of selectivity estimation for crossmodel joins with relational and tree-structured data. Our estimator is based on the Kernel Density Estimation (KDE) model, which is a statistical approach using a data sample to approximate multivariate probability distribution. KDE has been successfully applied in relational databases to estimate the selectivity of range and join query. In this work, we propose an estimation method called location-value estimation (LVE) model based on KDE, which considers both value joins and structural joins in relational and tree-structured data. To boost the estimation efficiency in large data samples, we further propose the max-min approximation (MMA) and grid-based approximation (GBA) models to approximate the KDE contribution. Extensive experiments on four real and synthetic datasets demonstrate the effectiveness, efficiency, and scalability of our techniques.

show abstract

Estimating join selectivities using bandwidth-optimized kernel density models

Cited by 58 publications

References 33 publications

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

Selectivity Estimation for Relation-Tree Joins

Contact Info

Product

Resources

About