Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data 2013
DOI: 10.1145/2463676.2465312
|View full text |Cite
|
Sign up to set email alerts
|

Quantiles over data streams

Abstract: A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterized by the quantiles of the data. The design and engineering of efficient methods to find these quantiles has attracted much study, especially in the case where the data is described incrementally, and we must compute the quantiles in an online, streaming fashion. Yet while such algorithms have prov… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
39
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 50 publications
(39 citation statements)
references
References 31 publications
0
39
0
Order By: Relevance
“…We compare against a number of alternative quantile summaries: a mergeable equi-width histogram (EW-Hist) using power-of-two ranges [65], the 'GKArray' (GK) variant of the Greenwald Khanna [34,52] sketch, the AVL-tree T-Digest (T-Digest) [28] sketch, the streaming histogram (S-Hist) in [12] as implemented in Druid, the 'Random' (RandomW) sketch from [52,77], reservoir sampling (Sampling) [76], and the low discrepancy mergeable sketch (Merge12) from [3], both implemented in the Yahoo! datasketches library [1].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We compare against a number of alternative quantile summaries: a mergeable equi-width histogram (EW-Hist) using power-of-two ranges [65], the 'GKArray' (GK) variant of the Greenwald Khanna [34,52] sketch, the AVL-tree T-Digest (T-Digest) [28] sketch, the streaming histogram (S-Hist) in [12] as implemented in Druid, the 'Random' (RandomW) sketch from [52,77], reservoir sampling (Sampling) [76], and the low discrepancy mergeable sketch (Merge12) from [3], both implemented in the Yahoo! datasketches library [1].…”
Section: Methodsmentioning
confidence: 99%
“…We quantify the accuracy of a quantile estimate using the quantile error ε as defined in Section 3.1. Then, as in [52,77] we can compare the accuracies of summaries on a given dataset by computing their average error ϵ avg over a set of uniformly spaced ϕquantiles. In the evaluation that follows, we test on 21 equally spaced ϕ between 0.01 and 0.99.…”
Section: Methodsmentioning
confidence: 99%
“…A simple deterministic version of their algorithm achieves the same bounds. This was pointed out, for example, by [1]. We refer to their algorithm as MRL.…”
Section: Related Workmentioning
confidence: 97%
“…Shrivastava et al [24] present a streaming algorithm for -approximate quantiles called the "QDigest" that has a space complexity of O( 1 log U ), where U is the size of the input domain. Wang et al [26] performed an experimental evaluation of different streaming algorithms [15,24,19]. They concluded that MRL99 [19] and Greenwald-Khanna [15] are two very competitive algorithms with MRL99 performing slightly better than Greenwald-Khanna in terms of space requirement and time for a given accuracy.…”
Section: Related Workmentioning
confidence: 99%