2018
DOI: 10.14778/3204028.3204032
|View full text |Cite
|
Sign up to set email alerts
|

Discovery of genuine functional dependencies from relational data with missing values

Abstract: Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of ge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(18 citation statements)
references
References 31 publications
0
18
0
Order By: Relevance
“…Functional dependency (FD) is an orthogonal approach that leverages multicolumn dependency for data quality [56]. While FDs inferred from individual table instances often may not hold in a semantic sense [19,51], we nevertheless evaluate the fraction of benchmark columns that are part of any FD from their original tables, which would be a recall upper-bound for FD-based approaches. For simplicity, in this analysis, we assume a perfect precision for FD-based methods.…”
Section: Methods Comparedmentioning
confidence: 99%
“…Functional dependency (FD) is an orthogonal approach that leverages multicolumn dependency for data quality [56]. While FDs inferred from individual table instances often may not hold in a semantic sense [19,51], we nevertheless evaluate the fraction of benchmark columns that are part of any FD from their original tables, which would be a recall upper-bound for FD-based approaches. For simplicity, in this analysis, we assume a perfect precision for FD-based methods.…”
Section: Methods Comparedmentioning
confidence: 99%
“…The above condition allows an FD to hold for most tuples allowing some violations. This equation captures the essence of approximate FDs used in multiple works [3,19,20,25,30]. Two core approaches are adopted in these works to discover approximate dependencies that satisfy:…”
Section: Functional Dependenciesmentioning
confidence: 99%
“…(1) Use likelihood-based measures to find groups of attributes that satisfy Equation 1 [3,19,20,25]. Typically these methods compute the approximate distribution (and likelihood) by considering co-occurrence counts between values of (X, Y ) and normalizing those by counts of values of X [3,20,25].…”
Section: Functional Dependenciesmentioning
confidence: 99%
“…We devise an effective and efficient algorithm to automatically discover PFDs from dirty datasets. Note that, although profiling ICs from clean data [14,4] has been widely studied, discovering them from dirty data is known to be much harder [6,9]. (Section 4) 4.…”
Section: Contributionsmentioning
confidence: 99%