2007
DOI: 10.1137/050643672
|View full text |Cite
|
Sign up to set email alerts
|

Range‐Efficient Counting of Distinct Elements in a Massive Data Stream

Abstract: Abstract. Efficient one-pass estimation of F 0 , the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider rangeefficient estimation of F 0 : estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer but an interval of integers. We present a randomized algorithm which yields an ( , δ)-approximation of F 0 , with the following time and space complexities (n is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0
1

Year Published

2009
2009
2018
2018

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(44 citation statements)
references
References 14 publications
(21 reference statements)
0
43
0
1
Order By: Relevance
“…For p ≤ 1, there is no expectation, while for general p these are heavy-tailed, so there is a nonnegligible (1/poly(n)) probability of observing a value that is poly(n) times c. Nevertheless, the references above give (ε, δ)-approximations for these problems in O * (1) space, and by our transformation, we obtain near-optimal PAPs. We also get a near-optimal PAP for the related distinct summation problem in sensor networks [53], which also does not have a sharply concentrated NBE. Here, for each j ∈ [n] there is a vj ∈ {1, .…”
Section: Repeat the Followingmentioning
confidence: 99%
See 2 more Smart Citations
“…For p ≤ 1, there is no expectation, while for general p these are heavy-tailed, so there is a nonnegligible (1/poly(n)) probability of observing a value that is poly(n) times c. Nevertheless, the references above give (ε, δ)-approximations for these problems in O * (1) space, and by our transformation, we obtain near-optimal PAPs. We also get a near-optimal PAP for the related distinct summation problem in sensor networks [53], which also does not have a sharply concentrated NBE. Here, for each j ∈ [n] there is a vj ∈ {1, .…”
Section: Repeat the Followingmentioning
confidence: 99%
“…, M } n , and the max-dominance norm is n j=1 max(xj, yj). This problem, and its generalization, the dominant p-norm, defined as ( n j=1 max(xj, yj) p ) 1/p for p > 0, are studied in [19,53,56,57,58] (in [40] this problem is instead studied for p < 0, which is useful for coordinatewise minima). There are no sharply concentrated NBEs known for p > 0.…”
Section: Repeat the Followingmentioning
confidence: 99%
See 1 more Smart Citation
“…This work uses a different technique than was used in [45]. The algorithm in [45] was based on sampling, along the lines of [43], while our algorithm is based on sub-sampling and linear sketches with special properties.…”
Section: Related Workmentioning
confidence: 99%
“…The first works to try to overcome this were for 1-dimensional rectangles, i.e., line segments. The notion of a range-efficient sketch was introduced in [5] for F k -estimation, and further refined for F0 estimation in [43,47], allowing one to process a segment in time only logarithmic in its length. Many other problems have been reduced to range-efficient F k -estimation, for k ≥ 0, such as distinct summation problem [18,43], duplicate insensitive sketches [38], maximum-dominance norm [19], self-join size of the symmetric difference of relations [44], and counting triangles in graphs [5].…”
Section: Problem Definitionmentioning
confidence: 99%