Cardinality estimation in DBMS

Han, Yuxing; Wu, Zhanjun; Wu, Peizhi; Zhu, Rong; Yang, Jingyi; Tan, Liang Wei; Zeng, Kai; Cong, Gao; Qin, Yanzhao; Pfadler, Andreas; Qian, Zhuzhong; Zhou, Jingren; Li, Jiangneng; Cui, Bin

doi:10.14778/3503585.3503586

Cited by 41 publications

(8 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The cardinality estimation problem for database queries has recently been tackled using mainly classical machine learning algorithms [20]. While the results of these methods are more accurate than traditional non-machine learning-based approaches, they still suffer from high training and inference costs while learning multivariate correlations [15,48], and can mitigate these "side effects" only when using single-table statistics as an input [51]. This is still an unsolved problem, and therefore, it is uncertain if such models will prevail in real databases, especially in dynamic environments [48].…”

Section: Related Workmentioning

confidence: 99%

“…Learned cardinalities estimators (LCEs) are exactly attempting to mitigate these issues with the help of machine learning methods. Although deep learning seems to give significant improvement compared to classical methods on datasets with more complicated data distributions and join schemas, it still requires hundreds of millions of learnable parameters that are often hard to tune [15].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

QardEst: Using Quantum Machine Learning for Cardinality Estimation of Join Queries

Kittelmann,

Sulimov,

Stockinger

2024

Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

QardEst: Using Quantum Machine Learning for Cardinality Estimation of Join Queries

Kittelmann,

Sulimov,

Stockinger

2024

Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications

View full text Add to dashboard Cite

“…a query set can be cast as the multi-query-dataset cardinality estimation (MCE) problem. So, we also review existing work on single-query-dataset cardinality estimation (SCE), which can be categorized as querydriven methods and data-driven methods [21,30]. Query-driven methods [17,22,31,41,46,47,49,52,53] deploy discriminative models trained on a set of historic queries to predict the cardinality for a single query.…”

Section: Related Workmentioning

confidence: 99%

“…Data-driven methods [24,37,39,51,54,56,57,60] deploy generative models trained on data without using query workloads. In general, query-driven methods are inflexible, especially when representative queries are unavailable, and data-driven methods can achieve better performance than query-driven methods [21]. However, existing approaches to the SCE problem cannot address the MCE problem as they only estimate the cardinality of a single query over a single dataset while we estimate the cardinality for a query set over a set of datasets.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Representative Routes Discovery from Massive Trajectories

Wang

Huang

Bao

et al. 2022

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

As available data increases, so too does the demand to discover new datasets to solve new problems. Existing studies using a base dataset and a keyword search often yield coarse-grained results where significant information overlaps and non-relevant data occur. They also implicitly assume that a user can purchase all datasets found, which is rarely true in practice. Therefore, achieving dataset discovery results with less redundancy using more fine-grained information needs and a budget is desirable. To achieve this, we study the problem of finding a set of datasets that maximize distinctiveness based on a user's fine-grained information needs while keeping the total price of the datasets within a user-defined budget. Note that the user may also have a base dataset that they want to expand. Here, the user's fine-grained information needs are expressed as a query set and the distinctiveness score for a set of datasets, which is the number of distinct tuples produced by running the query set on the datasets which are do not overlap with the base dataset. First, we prove the NP-hardness of this problem. Then, we develop a greedy algorithm that achieves an approximation of (1 − −1 )/2. But this algorithm is neither efficient nor scalable as it frequently computes the exact distinctiveness during dataset selection, which requires every tuple for the query result overlap in multiple datasets to be tested. To address this limitation, we propose an efficient and effective machine-learning-based (ML-based) algorithm to estimate the distinctiveness for a set of datasets, without the need for testing every tuple. The proposed algorithm is the first to support cardinality estimation (CE) for a query set on multiple datasets, as previous studies only support CE for a single query on a single dataset, and cannot effectively identify query result overlaps in multiple datasets. Extensive experiments using five real-world data pools demonstrate that our greedy algorithm using ML-based distinctiveness estimation outperforms all other baselines in both effectiveness and efficiency.

show abstract

Quantum Data Management and Quantum Machine Learning for Data Management: State-of-the-Art and Open Challenges

Groppe,

Çalıkyılmaz

et al. 2023

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

View full text Add to dashboard Cite

Cardinality estimation in DBMS

Cited by 41 publications

References 37 publications

QardEst: Using Quantum Machine Learning for Cardinality Estimation of Join Queries

QardEst: Using Quantum Machine Learning for Cardinality Estimation of Join Queries

Representative Routes Discovery from Massive Trajectories

Quantum Data Management and Quantum Machine Learning for Data Management: State-of-the-Art and Open Challenges

Contact Info

Product

Resources

About