“…In our experiments, for tables in database 1 we use different bucket numbers 10,15,20,25,30, to produce corresponding histograms. For databases used in database 2, we use bucket numbers -10, 20, 30, 40, 50 to produce different histograms.…”
Section: Methodsmentioning
confidence: 99%
“…Our paradigm follows the framework in [2,10,11,12]. Let P * (X, Y ) represent the optimal result of using Y buckets to partition the first X values of T .…”
Section: A Dynamic Programming Paradigmmentioning
confidence: 99%
“…The key issues in histogram techniques are: how to partition the original data, what to store in each bucket, and how to estimate the result of an aggregation query for a given histogram. Many histogram techniques have been developed in [7,8,9,10,13,17].…”
Abstract. Approximation is a very effective paradigm to speed up query processing in large databases. One popular approximation mechanism is data size reduction. There are three reduction techniques: sampling, histograms, and wavelets. Histogram techniques are supported by many commercial database systems, and have been shown very effective for approximately processing aggregation queries. In this paper, we will investigate the optimal models for building histograms based on linear spline techniques. We will firstly propose several novel models. Secondly, we will present efficient algorithms to achieve these proposed optimal models. Our experiment results showed that our new techniques can greatly improve the approximation accuracy comparing to the existing techniques.
“…In our experiments, for tables in database 1 we use different bucket numbers 10,15,20,25,30, to produce corresponding histograms. For databases used in database 2, we use bucket numbers -10, 20, 30, 40, 50 to produce different histograms.…”
Section: Methodsmentioning
confidence: 99%
“…Our paradigm follows the framework in [2,10,11,12]. Let P * (X, Y ) represent the optimal result of using Y buckets to partition the first X values of T .…”
Section: A Dynamic Programming Paradigmmentioning
confidence: 99%
“…The key issues in histogram techniques are: how to partition the original data, what to store in each bucket, and how to estimate the result of an aggregation query for a given histogram. Many histogram techniques have been developed in [7,8,9,10,13,17].…”
Abstract. Approximation is a very effective paradigm to speed up query processing in large databases. One popular approximation mechanism is data size reduction. There are three reduction techniques: sampling, histograms, and wavelets. Histogram techniques are supported by many commercial database systems, and have been shown very effective for approximately processing aggregation queries. In this paper, we will investigate the optimal models for building histograms based on linear spline techniques. We will firstly propose several novel models. Secondly, we will present efficient algorithms to achieve these proposed optimal models. Our experiment results showed that our new techniques can greatly improve the approximation accuracy comparing to the existing techniques.
“…However, this work offers no solution regarding which set of synopses to construct and does not take into account the characteristics of the workload. A similar approach for histograms was proposed in [15], extending [19] by offering heuristics that reduce the overhead of the dynamic programming problem. [6] considers a limited version of the problem: a set of synopses for query optimization are selected, based on whether or not they make a difference in plan selection.…”
Section: Related Workmentioning
confidence: 99%
“…The queries selected were Q 1 , Q 6 , Q 13 , Q 15 and Q 17 , referring to the Lineitem, Part, Orders, and Customer tables 1 . Table 2 shows the query-relevant attribute sets, the minimum sets Min(Q i ), for the above five queries.…”
Abstract. Maintaining statistics on multidimensional data distributions is crucial for predicting the run-time and result size of queries and data analysis tasks with acceptable accuracy. To this end a plethora of techniques have been proposed for maintaining a compact data "synopsis" on a single table, ranging from variants of histograms to methods based on wavelets and other transforms. However, the fundamental question of how to reconcile the synopses for large information sources with many tables has been largely unexplored. This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources. It shows how to compute the optimal combination of synopses for a given workload and a limited amount of available memory. The practicality of the approach and the accuracy of the proposed heuristics are demonstrated by experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.