M. Kutsch scite author profile

Database columns are often correlated, so that cardinality estimates computed by assuming independence often lead to a poor choice of query plan by the optimizer. Multidimensional histograms can help solve this problem, but the traditional approach of building such histograms using a data scan often scales poorly and does not always yield the best histogram for a given workload. An attractive alternative is to gather feedback from the query execution engine about the observed cardinality of predicates and use this feedback as the basis for a histogram. In this paper we describe ISOMER, a new feedback-based algorithm for collecting optimizer statistics by constructing and maintaining multidimensional histograms. ISOMER uses the maximumentropy principle to approximate the true data distribution by a histogram distribution that is as "simple" as possible while being consistent with the observed predicate cardinalities. ISOMER adapts readily to changes in the underlying data, automatically detecting and eliminating inconsistent feedback information in an efficient manner. The algorithm controls the size of the histogram by retaining only the most "important" feedback. Our experiments indicate that, unlike previous methods for feedback-driven histogram maintenance, ISOMER imposes little overhead, is extremely scalable, and yields highly accurate cardinality estimates while using only a modest amount of storage.

show abstract

Consistent selectivity estimation via maximum entropy

Markl

Haas

Kutsch

et al. 2006

The VLDB Journal

View full text Add to dashboard Cite

Dynamically optimizing queries over large scale data platforms

Karanasos

Balmin²,

Kutsch

et al. 2014

View full text Add to dashboard Cite

Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain actionable insights from their "big data". Query optimization is still an open challenge in this environment due to the volume and heterogeneity of data, comprising both structured and un/semi-structured datasets. Moreover, it has become common practice to push business logic close to the data via userdefined functions (UDFs), which are usually opaque to the optimizer, further complicating cost-based optimization. As a result, classical relational query optimization techniques do not fit well in this setting, while at the same time, suboptimal query plans can be disastrous with large datasets.In this paper, we propose new techniques that take into account UDFs and correlations between relations for optimizing queries running on large scale clusters. We introduce "pilot runs", which execute part of the query over a sample of the data to estimate selectivities, and employ a cost-based optimizer that uses these selectivities to choose an initial query plan. Then, we follow a dynamic optimization approach, in which plans evolve as parts of the queries get executed. Our experimental results show that our techniques produce plans that are at least as good as, and up to 2x (4x) better for Jaql (Hive) than, the best hand-written left-deep query plans.

show abstract

Integrating a Maximum-Entropy Cardinality Estimator into DB2 UDB

Kutsch

Haas

Markl

et al. 2006

View full text Add to dashboard Cite

Maxent

Markl

Kutsch

Tran

et al. 2006

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

M. Kutsch

ISOMER: Consistent Histogram Construction Using Query Feedback

Consistent selectivity estimation via maximum entropy

Dynamically optimizing queries over large scale data platforms

Integrating a Maximum-Entropy Cardinality Estimator into DB2 UDB

Maxent

Contact Info

Product

Resources

About