2015
DOI: 10.14778/2850583.2850594
|View full text |Cite
|
Sign up to set email alerts
|

How good are query optimizers, really?

Abstract: Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the quality of industrial-strength cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

7
346
0
4

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 447 publications
(357 citation statements)
references
References 41 publications
7
346
0
4
Order By: Relevance
“…The purpose of this paper is to shed light on profitable directions to explore. We consider the Join Order Benchmark (JOB) proposed by Leis et al [12]. JOB is a workload of 113 queries, with varying numbers of joins.…”
Section: Introductionmentioning
confidence: 99%
“…The purpose of this paper is to shed light on profitable directions to explore. We consider the Join Order Benchmark (JOB) proposed by Leis et al [12]. JOB is a workload of 113 queries, with varying numbers of joins.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we have introduced a new variant of TPC-H, named JCC-H, that adds correlations and skew to TPC-H. 7 JCC-H was carefully designed to include very severe join skew as well as filter skew. Moreover, these skewed effects are observed by the original 22 TPC-H queries only if special parameters are given to them.…”
Section: Resultsmentioning
confidence: 99%
“…This type of correlations was long elusive for query optimizers using the independence assumption, but thanks to ample CPU power nowadays available, cardinality estimation is increasingly done by executing predicates on table samples, which catches any correlation within a single table. It was recently confirmed [7] that faulty cardinality estimation is the main problem for join-order optimization (which arguably is the most important query optimization problem), and as such the frontier for systems and for database research into this are correlations not within the same table, but across different tables. To continue the example, in a join of Panameras towards a SALES(date, price, brand, type) table, the optimizer would probably mis-estimate the cardinality of extract(year from date) between 2000 and 2010 because the Panamera was introduced only in 2009.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Without splitting the IS NULL disjunctions introduced by our translation, PostgreSQL produces query plans with astronomical costs, as it resorts to nested-loop joins even for large tables. This is due to the fact that it underestimates the size of joins, which is a known issue for major DBMSs [21]. In order to make the optimizer produce better estimates and a reasonable query plan, the direct translation of these queries may also require some additional hand-tuning involving common table expressions.…”
Section: T Where ( A=b or A Is Null Or B Is Null ) And · · · And (mentioning
confidence: 99%