Data cube approximation and histograms via wavelets

Vitter, Jeffrey Scott; Wang, Min; Iyer, B. R.

doi:10.1145/288627.288645

Cited by 183 publications

(135 citation statements)

References 16 publications

Supporting

Mentioning

134

Contrasting

Unclassified

Order By: Relevance

“…There exists a sizeable bibliography in approximate query answering techniques [18], [30], [19], [37], [3], [35], [10], [11]. Our approach is fundamentally different.…”

Section: Discussionmentioning

confidence: 99%

Using datacube aggregates for approximate querying and deviation detection

Palpanas

Koudas

Mendelzon

2005

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Abstract-Much research has been devoted to the efficient computation of relational aggregations and, specifically, the efficient execution of the datacube operation. In this paper, we consider the inverse problem, that of deriving (approximately) the original data from the aggregates. We motivate this problem in the context of two specific application areas, approximate query answering and data analysis. We propose a framework based on the notion of information entropy that enables us to estimate the original values in a data set, given only aggregated information about it. We then show how approximate queries on the data from which the aggregates were derived can be performed using our framework. We also describe an alternate use of the proposed framework that enables us to identify values that deviate from the underlying data distribution, suitable for data mining purposes. We present a detailed performance study of the algorithms using both real and synthetic data, highlighting the benefits of our approach as well as the efficiency of the proposed solutions. Finally, we evaluate our techniques with a case study on a real data set, which illustrates the applicability of our approach.

show abstract

“…There exists a sizeable bibliography in approximate query answering techniques [18], [30], [19], [37], [3], [35], [10], [11]. Our approach is fundamentally different.…”

Section: Discussionmentioning

confidence: 99%

Using datacube aggregates for approximate querying and deviation detection

Palpanas

Koudas

Mendelzon

2005

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

show abstract

“…Vitter et al [20,21] propose approximating data cubes using the wavelet transform. While [20] explicitly deals with the aspect of sparseness (which is not addressed in this paper) [21], like IDC, targets MOLAP data cubes.…”

Section: Related Workmentioning

confidence: 99%

“…While [20] explicitly deals with the aspect of sparseness (which is not addressed in this paper) [21], like IDC, targets MOLAP data cubes. Wavelets offer a compact representation of the data cube on multiple levels of resolution.…”

Section: Related Workmentioning

confidence: 99%

“…Using wavelets to encode the original data cube, however, increases the update costs and does not result in a better worst case performance when exact results are required. While [21] proposes encoding the pre-aggregated data cube which is used for the PS technique, any pre-aggregated (or the original) data cube can be encoded using wavelets. In that sense wavelet transform and IDC are orthogonal techniques 1 .…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Flexible Data Cubes for Online Aggregation

Riedewald

Agrawal

Abbadi

2001

Database Theory — ICDT 2001

View full text Add to dashboard Cite

Abstract. Applications like Online Analytical Processing depend heavily on the ability to quickly summarize large amounts of information. Techniques were proposed recently that speed up aggregate range queries on MOLAP data cubes by storing pre-computed aggregates. These approaches try to handle data cubes of any dimensionality by dealing with all dimensions at the same time and treat the different dimensions uniformly. The algorithms are typically complex, and it is difficult to prove their correctness and to analyze their performance. We present a new technique to generate Iterative Data Cubes (IDC) that addresses these problems. The proposed approach provides a modular framework for combining one-dimensional aggregation techniques to create spaceoptimal high-dimensional data cubes. A large variety of cost tradeoffs for high-dimensional IDC can be generated, making it easy to find the right configuration based on the application requirements.

show abstract

“…In the last 30 years, there have been a huge amount of works about synopses data structures applied in approximate answering approaches, whose main contributions are: 1) histograms [GMP97, GK01,PIHS96], that partition attribute values domain into a set of buckets; 2) samples [Olk93], which are based on the idea that a small random sample S of the data often wellrepresents the entire data; 3) Wavelets [VWI98], which are a mathematical tool for hierarchical decomposition of functions/ signals. Multi-dimensional data synopses are used to approximate the joint data distribution of multiple attributes [AGPR99].…”

Section: Synopsis Data Structures For Relational Data Warehousesmentioning

confidence: 99%

A Synopsis based Approach for XML Fast Approximate Querying

Comai

Marrara

Tanca

Flexible Databases Supporting Imprecision and Uncertainty

View full text Add to dashboard Cite

Summary. XML was born to represent, exchange and publish information on the Web, but now it has spread in many other applications. Due to this success, the W3C has proposed a new query language, XQuery, specifically designed to query XML data. XQuery allows to obtain exact answers to queries; however when applied to large XML repositories or warehouses, such precise queries may require high response times. Our research proposes a methodology for the semi-automatic derivation of summarized documents (synopses) for massive, heterogeneous XML data-sets, with the final aim of producing query transformation rules from queries on the original data-sets to queries on the summarized data-set. Introduction and MotivationIn the last few years, XML has spread in many application fields and today it is used as a format to exchange data on the web, to ensure interoperability among applications. Due to this success, the W3C has proposed a new query language, XQuery [W3C04], specifically designed to query XML data. XQuery is a well-defined but rather complex language [HPG04]. In this work we propose a new approach to overcome the problem of the high computational costs required by aggregate queries over massive XML data collections. In traditional relational warehouses [GPA + 98] a similar problem is solved by means of fast approximate queries, that use concise data statistics based on histograms or on other statistical techniques. Their most common application is for aggregate queries in modern decision support systems, where large volumes of data need to be queried, and quick and interactive responses from the DBMS are claimed, e.g., to analyze the data in the warehouse in order to get trend information to evaluate marketing strategies. In such applications, users are often more interested to obtain an approximate answer computed in a short time rather than an exact one obtained in some minutes or, at the worst, hours.

show abstract

Data cube approximation and histograms via wavelets

Cited by 183 publications

References 16 publications

Using datacube aggregates for approximate querying and deviation detection

Using datacube aggregates for approximate querying and deviation detection

Flexible Data Cubes for Online Aggregation

A Synopsis based Approach for XML Fast Approximate Querying

Contact Info

Product

Resources

About