Proceedings of the 2016 International Conference on Management of Data 2016
DOI: 10.1145/2882903.2882915
|View full text |Cite
|
Sign up to set email alerts
|

Streaming Algorithms for Robust Distinct Elements

Abstract: We study the problem of estimating distinct elements in the data stream model, which has a central role in traffic monitoring, query optimization, data mining and data integration. Different from all previous work, we study the problem in the noisy data setting, where two different looking items in the stream may reference the same entity (determined by a distance function and a threshold value), and the goal is to estimate the number of distinct entities in the stream. In this paper, we formalize the problem … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
10
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 36 publications
0
10
0
Order By: Relevance
“…We remark, as also pointed out in [9], that we cannot place our hope on a magic hash function that can map all the near-duplicates into the same element and otherwise into different elements, simply because such a magic hash function, if exists, needs a lot of bits to describe.…”
Section: Introductionmentioning
confidence: 81%
See 4 more Smart Citations
“…We remark, as also pointed out in [9], that we cannot place our hope on a magic hash function that can map all the near-duplicates into the same element and otherwise into different elements, simply because such a magic hash function, if exists, needs a lot of bits to describe.…”
Section: Introductionmentioning
confidence: 81%
“…This general problem has been recently proposed in [9], where the authors studied the estimation of the number of distinct elements of the data stream (also called F 0 ). In this paper we extend this line of research by studying another fundamental problem in the data stream literature: the distinct sampling (a.k.a.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations