2016
DOI: 10.1007/s00778-016-0424-7
|View full text |Cite
|
Sign up to set email alerts
|

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

Abstract: A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterized by the quantiles of the data. The design and engineering of efficient methods to find these quantiles has attracted much study, especially in the case where the data is given incrementally, and we must compute the quantiles in an online, streaming fashion. While such algorithms have proved to be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 41 publications
(54 citation statements)
references
References 29 publications
0
54
0
Order By: Relevance
“…We compare against a number of alternative quantile summaries: a mergeable equi-width histogram (EW-Hist) using power-of-two ranges [65], the 'GKArray' (GK) variant of the Greenwald Khanna [34,52] sketch, the AVL-tree T-Digest (T-Digest) [28] sketch, the streaming histogram (S-Hist) in [12] as implemented in Druid, the 'Random' (RandomW) sketch from [52,77], reservoir sampling (Sampling) [76], and the low discrepancy mergeable sketch (Merge12) from [3], both implemented in the Yahoo! datasketches library [1].…”
Section: Methodsmentioning
confidence: 99%
“…We compare against a number of alternative quantile summaries: a mergeable equi-width histogram (EW-Hist) using power-of-two ranges [65], the 'GKArray' (GK) variant of the Greenwald Khanna [34,52] sketch, the AVL-tree T-Digest (T-Digest) [28] sketch, the streaming histogram (S-Hist) in [12] as implemented in Druid, the 'Random' (RandomW) sketch from [52,77], reservoir sampling (Sampling) [76], and the low discrepancy mergeable sketch (Merge12) from [3], both implemented in the Yahoo! datasketches library [1].…”
Section: Methodsmentioning
confidence: 99%
“…Note that the lower and upper bounds on the rank of any stored number differ by at most 2δN and upper (or lower) bounds on the rank of two consecutive stored numbers differ by at most 2δN as well. The space requirement of Q(δ) is O( 1 δ · log δN ), however, in practice the space used is observed to scale linearly with 1 δ [36]. (Note that an offline optimal data structure for δ-approximate quantiles uses space O 1 δ .)…”
Section: Processing the Stream And Roundingmentioning
confidence: 99%
“…Aside from oblivious sampling algorithms (which require storing Ω(1/ε 2 ) samples) the only other such work of which we are aware is an approach by Wang, Luo, Yi, and Cormode [12] that combines the methods of [1] and [8] into a hybrid with the same space bound as [1].…”
Section: Previous and Related Workmentioning
confidence: 99%
“…[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18] For the comparison model, the best deterministic online summary to date is the (GK) summary of Greenwald and Khanna [4], which uses O((1/ε) log(εn)) space. This improved upon a deterministic (MRL) summary of Manku, Rajagopalan, and Lindsay [7] and a summary implied by Munro and Paterson [9], which use O((1/ε) log 2 (εn)) space.…”
Section: Previous and Related Workmentioning
confidence: 99%
See 1 more Smart Citation