Global optimization of histograms

Jagadish, H. V.; Jin, Hui; Ooi, Beng Chin; Tan, Kian-Lee

doi:10.1145/375663.375687

Cited by 37 publications

(28 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In our experiments, for tables in database 1 we use different bucket numbers 10,15,20,25,30, to produce corresponding histograms. For databases used in database 2, we use bucket numbers -10, 20, 30, 40, 50 to produce different histograms.…”

Section: Methodsmentioning

confidence: 99%

“…Our paradigm follows the framework in [2,10,11,12]. Let P * (X, Y ) represent the optimal result of using Y buckets to partition the first X values of T .…”

Section: A Dynamic Programming Paradigmmentioning

confidence: 99%

“…The key issues in histogram techniques are: how to partition the original data, what to store in each bucket, and how to estimate the result of an aggregation query for a given histogram. Many histogram techniques have been developed in [7,8,9,10,13,17].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

On Linear-Spline Based Histograms

Zhang

Lin

2002

Advances in Web-Age Information Management

View full text Add to dashboard Cite

Abstract. Approximation is a very effective paradigm to speed up query processing in large databases. One popular approximation mechanism is data size reduction. There are three reduction techniques: sampling, histograms, and wavelets. Histogram techniques are supported by many commercial database systems, and have been shown very effective for approximately processing aggregation queries. In this paper, we will investigate the optimal models for building histograms based on linear spline techniques. We will firstly propose several novel models. Secondly, we will present efficient algorithms to achieve these proposed optimal models. Our experiment results showed that our new techniques can greatly improve the approximation accuracy comparing to the existing techniques.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Our paradigm follows the framework in [2,10,11,12]. Let P * (X, Y ) represent the optimal result of using Y buckets to partition the first X values of T .…”

Section: A Dynamic Programming Paradigmmentioning

confidence: 99%

See 1 more Smart Citation

On Linear-Spline Based Histograms

Zhang

Lin

2002

Advances in Web-Age Information Management

View full text Add to dashboard Cite

show abstract

“…However, this work offers no solution regarding which set of synopses to construct and does not take into account the characteristics of the workload. A similar approach for histograms was proposed in [15], extending [19] by offering heuristics that reduce the overhead of the dynamic programming problem. [6] considers a limited version of the problem: a set of synopses for query optimization are selected, based on whether or not they make a difference in plan selection.…”

Section: Related Workmentioning

confidence: 99%

“…The queries selected were Q 1 , Q 6 , Q 13 , Q 15 and Q 17 , referring to the Lineitem, Part, Orders, and Customer tables 1 . Table 2 shows the query-relevant attribute sets, the minimum sets Min(Q i ), for the above five queries.…”

Section: Base Experimentsmentioning

confidence: 99%

A Framework for the Physical Design Problem for Data Synopses

König

Weikum

2002

Advances in Database Technology — EDBT 2002

View full text Add to dashboard Cite

Abstract. Maintaining statistics on multidimensional data distributions is crucial for predicting the run-time and result size of queries and data analysis tasks with acceptable accuracy. To this end a plethora of techniques have been proposed for maintaining a compact data "synopsis" on a single table, ranging from variants of histograms to methods based on wavelets and other transforms. However, the fundamental question of how to reconcile the synopses for large information sources with many tables has been largely unexplored. This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources. It shows how to compute the optimal combination of synopses for a given workload and a limited amount of available memory. The practicality of the approach and the accuracy of the proposed heuristics are demonstrated by experiments.

show abstract

Self-adaptive Statistics Management for Efficient Query Processing

Chen

Dong

et al. 2005

Advances in Web-Age Information Management

View full text Add to dashboard Cite

Global optimization of histograms

Cited by 37 publications

References 16 publications

On Linear-Spline Based Histograms

On Linear-Spline Based Histograms

A Framework for the Physical Design Problem for Data Synopses

Self-adaptive Statistics Management for Efficient Query Processing

Contact Info

Product

Resources

About