Cardinality estimation

Harmouch, Hazar; Naumann, Felix

doi:10.1145/3186728.3164145

Cited by 56 publications

(7 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Probabilistic data structures also known as data sketches are increasingly used to process big data or high speed data streams [1]. There are for example probabilistic data structures to estimate the cardinality of a set [2], the frequency of elements on a set [3], and the similarity between two sets [4]. Another function commonly implemented with probabilistic data structures is checking if an element belongs to a set.…”

Section: Introductionmentioning

confidence: 99%

On the Security of Quotient Filters: Attacks and Potential Countermeasures

Reviriego,

González,

Dayan

et al. 2024

IEEE Trans. Comput.

View full text Add to dashboard Cite

The security of probabilistic data structures is increasingly important due to their wide adoption in many computing systems and applications. In particular, the security of approximate membership check filters such as Bloom or cuckoo filters has been recently studied showing how an attacker can degrade the filter performance in some settings. In this paper, we consider for the first time the security of another popular approximate membership check filter, the Quotient Filter (QF). Our analysis and simulations show that quotient filters are vulnerable to both white and black box attackers that can cause insertion failures and degrade the filter performance very significantly. An interesting finding is that quotient filters are vulnerable to a new type of attack, not applicable to Bloom or cuckoo filters, that can degrade the speed of queries dramatically. The paper also briefly discusses and evaluates potential countermeasures to detect and protect against those attacks.

show abstract

Section: Introductionmentioning

confidence: 99%

On the Security of Quotient Filters: Attacks and Potential Countermeasures

Reviriego,

González,

Dayan

et al. 2024

IEEE Trans. Comput.

View full text Add to dashboard Cite

show abstract

“…Accuracy query cost estimation can guide plan selection and help to avoid large overhead. Although the problem of estimating query cost has been studied for decades in relational databases, 5–9 there are relatively few similar efforts for graph databases. How to accurately estimate the query cost is challenging in graph queries, especially for property graph queries that contain complex query structures, join relationships, and numerous properties.…”

Section: Introductionmentioning

confidence: 99%

Query cost estimation in graph databases via emphasizing query dependencies by using a neural reasoning network

et al. 2023

Concurrency and Computation

View full text Add to dashboard Cite

SummaryWith the increasing complexity of graph queries, query cost estimation has become a key challenge in graph databases. Accurate estimation results are critical for database administrators or database management systems to perform query processing or optimization tasks. An efficient and accurate estimation model can improve the estimation quality and make the produced results credible. Although learning‐based methods have been applied in query cost estimation, most of them are directed at relational queries and cannot be directly used for graph queries. Furthermore, most estimation approaches focus on the correlations between predicates or columns. The dependencies between query schema and query filter conditions and the correlation between query schema are ignored. In this study, we construct a novel deep learning model composed of reasoning and retrieval processes that can accurately capture the potential logical relationships in graph queries. This solves the above problems to some extent. In addition, we propose a query estimation framework that divides the estimation task into query workload generation, training data collection, feature extraction and encoding, and estimation model construction. The results of the experiment on real‐world datasets show that our estimation model can improve the estimation quality and outperforms other compared deep learning models in terms of estimation accuracy.

show abstract

“…However, using this aggregation to approximate the union dataset's cardinality may exhibit a large estimation error because different DHs' datasets may have many elements in common. To address this challenge, a number of sketch methods [6] have been proposed such as the Flajolet-Martin (in short FM) sketch [7] and the HyperLogLog (in short HLL) sketch [8]. The key to these methods is the construction of a compact and mergeable sketch (i.e., a data summary) on each DH's dataset.…”

Section: Introductionmentioning

confidence: 99%

An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation

Wang¹,

Yang²,

Xie³

et al. 2023

Preprint

View full text Add to dashboard Cite

Counting the number of distinct elements distributed over multiple data holders is a fundamental problem with many real-world applications ranging from crowd counting to network monitoring. Although a number of space and computationally efficient sketch methods (e.g., the Flajolet-Martin sketch and the HyperLogLog sketch) for cardinality estimation have been proposed to solve the above problem, these sketch methods are insecure when considering privacy concerns related to the use of each data holder's personal dataset. Despite a recently proposed protocol that successfully implements the well-known Flajolet-Martin (FM) sketch on a secret-sharing based multiparty computation (MPC) framework for solving the problem of private distributed cardinality estimation (PDCE), we observe that this MPC-FM protocol is not differentially private. In addition, the MPC-FM protocol is computationally expensive, which limits its applications to data holders with limited computation resources. To address the above issues, in this paper we propose a novel protocol DP-DICE, which is computationally efficient and differentially private for solving the problem of PDCE. Experimental results show that our DP-DICE achieves orders of magnitude speedup and reduces the estimation error by several times in comparison with state-of-the-arts under the same security requirements.

show abstract

Cardinality estimation

Cited by 56 publications

References 31 publications

On the Security of Quotient Filters: Attacks and Potential Countermeasures

On the Security of Quotient Filters: Attacks and Potential Countermeasures

Query cost estimation in graph databases via emphasizing query dependencies by using a neural reasoning network

An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation

Contact Info

Product

Resources

About