2017
DOI: 10.1145/3186728.3164145
|View full text |Cite
|
Sign up to set email alerts
|

Cardinality estimation

Abstract: Data preparation and data profiling comprise many both basic and complex tasks to analyze a dataset at hand and extract metadata, such as data distributions, key candidates, and functional dependencies. Among the most important types of metadata is the number of distinct values in a column, also known as the zeroth-frequency moment. Cardinality estimation itself has been an active research topic in the past decades due to its many applications. The aim of this paper is to review the literature of cardinality e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
7
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 56 publications
(7 citation statements)
references
References 31 publications
0
7
0
Order By: Relevance
“…Probabilistic data structures also known as data sketches are increasingly used to process big data or high speed data streams [1]. There are for example probabilistic data structures to estimate the cardinality of a set [2], the frequency of elements on a set [3], and the similarity between two sets [4]. Another function commonly implemented with probabilistic data structures is checking if an element belongs to a set.…”
Section: Introductionmentioning
confidence: 99%
“…Probabilistic data structures also known as data sketches are increasingly used to process big data or high speed data streams [1]. There are for example probabilistic data structures to estimate the cardinality of a set [2], the frequency of elements on a set [3], and the similarity between two sets [4]. Another function commonly implemented with probabilistic data structures is checking if an element belongs to a set.…”
Section: Introductionmentioning
confidence: 99%
“…Accuracy query cost estimation can guide plan selection and help to avoid large overhead. Although the problem of estimating query cost has been studied for decades in relational databases, 5–9 there are relatively few similar efforts for graph databases. How to accurately estimate the query cost is challenging in graph queries, especially for property graph queries that contain complex query structures, join relationships, and numerous properties.…”
Section: Introductionmentioning
confidence: 99%
“…However, using this aggregation to approximate the union dataset's cardinality may exhibit a large estimation error because different DHs' datasets may have many elements in common. To address this challenge, a number of sketch methods [6] have been proposed such as the Flajolet-Martin (in short FM) sketch [7] and the HyperLogLog (in short HLL) sketch [8]. The key to these methods is the construction of a compact and mergeable sketch (i.e., a data summary) on each DH's dataset.…”
Section: Introductionmentioning
confidence: 99%