Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017
DOI: 10.1145/3097983.3098062
|View full text |Cite
|
Sign up to set email alerts
|

Discovering Reliable Approximate Functional Dependencies

Abstract: Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the depe… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
61
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 37 publications
(61 citation statements)
references
References 20 publications
0
61
0
Order By: Relevance
“…Moreover, bounding functions can be incorporated as an early termination criterion. For the reliable fraction of information in particular, there is potential to prune many of the higher levels of the search space as it favors solutions that are small in cardinality [14]. The algorithm is presented in Algorithm 2.…”
Section: B Search Algorithmsmentioning
confidence: 99%
See 2 more Smart Citations
“…Moreover, bounding functions can be incorporated as an early termination criterion. For the reliable fraction of information in particular, there is potential to prune many of the higher levels of the search space as it favors solutions that are small in cardinality [14]. The algorithm is presented in Algorithm 2.…”
Section: B Search Algorithmsmentioning
confidence: 99%
“…Computing this function is of course equivalent to the original optimization problem and hence NP-hard. We can, however, 14 3 15 22 25 23 24 4 10 26 2 20 9 31 5 27 7 19 28 21 12 29 11 17 35 1 8 30 32 6 33 34 13 Figure 3: Evaluatingf spc for branch-and-bound optimization. Relative nodes explored difference (left) and relative runtime difference (right) between methods OPUS spc and OPUS mon .…”
Section: Refined Bounding Functionmentioning
confidence: 99%
See 1 more Smart Citation
“…As there is no known combination of physical properties that fully describes δ E , we mine the top 10 features that each explain as much of δ E as possible [15], and therewith obtain 10 cause effect pairs where we set δ E as X and one of the mined features as Y . After consulting with domain experts, we assume Y → X as ground truth for all pairs.…”
Section: Case Study: Octet Binary Semi Conductorsmentioning
confidence: 99%
“…Wang et al give countingbased algorithms for deriving AFDs and their probabilities [3]. Mandros et al propose to search AFDs by adopting an information theoretic approach [4]. Above approaches only consider AFDs with single attribute on the left-hand side, which are not applicable for tables with more than one entity column.…”
Section: Introductionmentioning
confidence: 99%