2021
DOI: 10.48550/arxiv.2106.06525
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ExtendedHyperLogLog: Analysis of a new Cardinality Estimator

Abstract: We discuss the problem of counting distinct elements in a stream. A stream is usually considered as a sequence of elements that come one at a time. An exact solution to the problem requires memory space of the size of the stream. For many applications this solution is infeasible due to very large streams. The solution in that case, is to use a probabilistic data structure (also called sketch), from which we can estimate with high accuracy the cardinality of the stream. We present a new algorithm, ExtendedHyper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 9 publications
0
1
0
Order By: Relevance
“…This database split/partition idea of HNSW, or other graph-based NNS libraries, has been successfully applied in several industrial-level applications for other purposes (31,32). We also want to point out that the size of the compressed sketches (database file) from SetSketch is comparable to other space efficient algorithms such as HyperLogLog (with b=2, Setsketch asymptotically corresponds to HyperLogLog), ExtendedHyperLogLog (47), HyperLogLogLog (48) and UltraLogLog (49) (25% more space efficient than HyperLogLog). The Shannon entropy of SetSketch we implemented is…”
Section: Discussionmentioning
confidence: 99%
“…This database split/partition idea of HNSW, or other graph-based NNS libraries, has been successfully applied in several industrial-level applications for other purposes (31,32). We also want to point out that the size of the compressed sketches (database file) from SetSketch is comparable to other space efficient algorithms such as HyperLogLog (with b=2, Setsketch asymptotically corresponds to HyperLogLog), ExtendedHyperLogLog (47), HyperLogLogLog (48) and UltraLogLog (49) (25% more space efficient than HyperLogLog). The Shannon entropy of SetSketch we implemented is…”
Section: Discussionmentioning
confidence: 99%