Inner-Product Based Wavelet Synopses for Range-Sum Queries

Matias, Yossi; Urieli, Daniel

doi:10.1007/11841036_46

Cited by 10 publications

(9 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [119], a Haar-wavelet based histogram creates a synopsis of data to obtain accurate selectivity estimations for query optimization. In [120], optimality of the heuristic method in [119] is also demonstrated.…”

Section: Waveletsmentioning

confidence: 96%

Data Summarization Techniques for Big Data—A Survey

Hesabi

Tari

Gościński

et al. 2015

Handbook on Data Centers

View full text Add to dashboard Cite

Section: Waveletsmentioning

confidence: 96%

Data Summarization Techniques for Big Data—A Survey

Hesabi

Tari

Gościński

et al. 2015

Handbook on Data Centers

View full text Add to dashboard Cite

“…Another natural problem is to not consider the workload on point queries alone, but also consider the workload on range queries. Database research such as [19,5] has considered using range query workload to refine histograms; [21] has recently proposed changing the Haar basis to be workload-aware and find B-term wavelet synopsis for range workloads. Again, these results do not provide any theoretical guarantees on complexity and accuracy of the problem of computing H opt for range workloads, and provable results are of our interest.…”

Section: Discussionmentioning

confidence: 99%

Workload-Optimal Histograms on Streams

Muthukrishnan

Strauss

Zheng

2005

Algorithms – ESA 2005

View full text Add to dashboard Cite

Histograms are used in many ways in conventional databases and in data stream processing for summarizing massive data distributions. Previous work on constructing histograms on data streams with provable guarantees have not taken into account the workload characteristics of databases which show some parts of the distributions to be more frequently used than the others; on the other hand, previous work for constructing histograms that do make use of the workload characteristics-and have demonstrated the significant advantage of exploiting workload information-have not come with provable guarantees on the accuracy of the histograms or the time and space bounds needed to obtain reasonable accuracy. We study the algorithmic complexity of constructing workload-optimal histograms on data streams.We present an algorithm for constructing a nearly-optimal histogram in nearly linear time and polylogarithmic space, in one pass. In the more general cash register model where data is streamed as a series of updates, we can build a histogram using polylogarithmic space, polylogarithmic time to process each item, and polylogarithmic post-processing time to build the histogram. These are the first known algorithmic results with provable guarantees for workload-optimal histogram construction, and rely on a notion of linear robustness we introduce here. All these results need the workload to be explicitly stored since we show that if the workload is summarized in small space lossily, algorithmic results such as above do not exist. However, we show that our algorithmic results can be extended efficiently to the case when the workload is compressed without loss by using, for example, run-length encoding or a universal compression scheme of Lempel-Ziv.

show abstract

“…However, most recently, several papers [3,5,10,11,12,13] remark the effectiveness of using wavelet decomposition in reducing large amount of data to compact sets of wavelet coefficients, termed wavelet synopses. Wavelet synopses has been proved to provide fast and reasonably accurate approximate answers to queries.…”

Section: Problem Statement and Previous Workmentioning

confidence: 99%

“…Most of the wavelet synopses are generated under the assumption of uniform weights for both Point-wise and Range-sum approximations [3][5] [11]. For Point-wise approximation, the Parseval's theorem provides a solution that applies to all orthonormal data transforms, i.e., the best approximation is achieved by largest coefficients.…”

Section: Problem 1 (Point-wise Approximation) Letmentioning

confidence: 99%

Nonuniform Compression in Databases with Haar Wavelet

Chen

Nucci²

2007

2007 Data Compression Conference (DCC'07)

View full text Add to dashboard Cite

Data synopsis is a lossy compressed representation of data stored into databases that helps the query optimizer to speed up the query process, e.g. time to retrieve the data from the database. An efficient data synopsis must provide accurate information about the distribution of data to the query optimizer at any point in time. Due to the fact that some data will be queried more often than others, a good data synopsis should consider the use of nonuniform accuracy, e.g. provide better approximation of data that are queried the most. Although, the generation of data synopsis is a critical step to achieve a good approximation of the initial data representation, data synopsis must be updated over time when dealing with time varying data. In this paper, we introduce new Haar wavelet synopses for nonuniform accuracy and time-varying data that can be generated in linear time and space, and updated in sublinear time. The efficiency of our new data synopses is validated against other linear methods by using both synthetic and real data sets.

show abstract

Inner-Product Based Wavelet Synopses for Range-Sum Queries

Cited by 10 publications

References 12 publications

Data Summarization Techniques for Big Data—A Survey

Data Summarization Techniques for Big Data—A Survey

Workload-Optimal Histograms on Streams

Nonuniform Compression in Databases with Haar Wavelet

Contact Info

Product

Resources

About