How good are query optimizers, really?

Leis, Viktor; Gubichev, Andrey; Mirchev, Atanas; Boncz, Peter; Kemper, Alfons; Neumann, Thomas

doi:10.14778/2850583.2850594

Cited by 447 publications

(357 citation statements)

References 41 publications

Supporting

Mentioning

346

Contrasting

Unclassified

Order By: Relevance

“…The purpose of this paper is to shed light on profitable directions to explore. We consider the Join Order Benchmark (JOB) proposed by Leis et al [12]. JOB is a workload of 113 queries, with varying numbers of joins.…”

Section: Introductionmentioning

confidence: 99%

How I Learned to Stop Worrying and Love Re-optimization

Perron¹,

Shang²,

Kraska³

2019

2019 IEEE 35th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

Cost-based query optimizers remain one of the most important components of database management systems for analytic workloads. Though modern optimizers select plans close to optimal performance in the common case, a small number of queries are an order of magnitude slower than they could be. In this paper we investigate why this is still the case, despite decades of improvements to cost models, plan enumeration, and cardinality estimation. We demonstrate why we believe that a re-optimization mechanism is likely the most cost-effective way to improve end-to-end query performance. We find that even a simple re-optimization scheme can improve the latency of many poorly performing queries. We demonstrate that re-optimization improves the end-to-end latency of the top 20 longest running queries in the Join Order Benchmark by 27%, realizing most of the benefit of perfect cardinality estimation.

show abstract

Section: Introductionmentioning

confidence: 99%

How I Learned to Stop Worrying and Love Re-optimization

Perron¹,

Shang²,

Kraska³

2019

2019 IEEE 35th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

show abstract

“…In this paper, we have introduced a new variant of TPC-H, named JCC-H, that adds correlations and skew to TPC-H. 7 JCC-H was carefully designed to include very severe join skew as well as filter skew. Moreover, these skewed effects are observed by the original 22 TPC-H queries only if special parameters are given to them.…”

Section: Resultsmentioning

confidence: 99%

“…This type of correlations was long elusive for query optimizers using the independence assumption, but thanks to ample CPU power nowadays available, cardinality estimation is increasingly done by executing predicates on table samples, which catches any correlation within a single table. It was recently confirmed [7] that faulty cardinality estimation is the main problem for join-order optimization (which arguably is the most important query optimization problem), and as such the frontier for systems and for database research into this are correlations not within the same table, but across different tables. To continue the example, in a join of Panameras towards a SALES(date, price, brand, type) table, the optimizer would probably mis-estimate the cardinality of extract(year from date) between 2000 and 2010 because the Panamera was introduced only in 2009.…”

Section: Introductionmentioning

confidence: 99%

“…It has been observed that for systems, workload scheduling could be eased if queries that affect very large volumes of data ("whale" queries, as opposed to normal "fish" queries) could be detected and handled differently. However, due to errors in cardinality estimation (which are often caused by join-crossing correlations and skew [7]) this is non-trivial.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

JCC-H: Adding Join Crossing Correlations with Skew to TPC-H

Boncz

Anatiotis

Kläbe

2017

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. We introduce JCC-H, a drop-in replacement for the data and query generator of TPC-H, that introduces Join-Crossing-Correlations (JCC) and skew into its dataset and query workload. These correlations are carefully designed such that the filter predicates on table columns in the existing TPC-H queries now suddenly can have effects on the value-, frequency-and join-fan-out-distributions, experienced by operators in the query plan. The query generator of JCC-H is able to generate parameter bindings for the 22 query templates in two different equivalence classes: query templates that receive "normal" parameters do not experience skew and behave very similar to default TPC-H queries. Query templates expanded with the "skewed" parameters, though, experience strong join-crossing-correlations and skew in filter, aggregation and join operations. In this paper we discuss the goals of JCC-H, its detailed design, as well as show initial experiments on both a single-server and MPP database system, that confirm that our design goals were largely met. In all, JCC-H provides a convenient way for any system that is already testing with TPC-H to examine how the system can handle skew and correlations, so we hope the community can use it to make progress on issues like skew mitigation and detection and exploitation of join-crossing-correlations in query optimizers and data storage.

show abstract

“…Without splitting the IS NULL disjunctions introduced by our translation, PostgreSQL produces query plans with astronomical costs, as it resorts to nested-loop joins even for large tables. This is due to the fact that it underestimates the size of joins, which is a known issue for major DBMSs [21]. In order to make the optimizer produce better estimates and a reasonable query plan, the direct translation of these queries may also require some additional hand-tuning involving common table expressions.…”

Section: T Where ( A=b or A Is Null Or B Is Null ) And · · · And (mentioning

confidence: 99%

Correctness of SQL Queries on Databases with Nulls

Guagliardo

Libkin

2017

SIGMOD Rec.

View full text Add to dashboard Cite

Multiple issues with SQL's handling of nulls have been well documented. Having efficiency as its main goal, SQL disregards the standard notion of correctness on incomplete databases -certain answers -due to its high complexity. As a result, the evaluation of SQL queries on databases with nulls may produce answers that are just plain wrong. However, SQL evaluation can be modified, at least for relational algebra queries, to approximate certain answers, i.e., return only correct answers. We examine recently proposed approximation schemes for certain answers and analyze their complexity, both theoretical bounds and real-life behavior.

show abstract

How good are query optimizers, really?

Cited by 447 publications

References 41 publications

How I Learned to Stop Worrying and Love Re-optimization

How I Learned to Stop Worrying and Love Re-optimization

JCC-H: Adding Join Crossing Correlations with Skew to TPC-H

Correctness of SQL Queries on Databases with Nulls

Contact Info

Product

Resources

About