2007 IEEE 23rd International Conference on Data Engineering 2007
DOI: 10.1109/icde.2007.369032
|View full text |Cite
|
Sign up to set email alerts
|

Efficiently Detecting Inclusion Dependencies

Abstract: Data sources for data integration often come with spurious schema definitions such as undefined foreign key constraints. Such metadata are important for querying the database and for database integration.We present our algorithm SPIDER (Single Pass Inclusion DEpendency Recognition) for detecting inclusion dependencies, as these are the automatically testable part of a foreign key constraint. For IND detection all pairs of attributes must be tested. SPIDER solves this task very efficiently by testing all attrib… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
33
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 33 publications
(33 citation statements)
references
References 7 publications
0
33
0
Order By: Relevance
“…Surprisingly, little previous work deals with the case of discovering multi-column foreign keys [14]. Even for single-column keys, existing work is limited and focuses mainly on identifying inclusion dependencies, since the only formal requirement for specifying a foreign key constraint is that the foreign key be a subset of the primary key [1,14]. However, checking only for inclusion can lead to a large number of false positives.…”
Section: Trade Historymentioning
confidence: 99%
“…Surprisingly, little previous work deals with the case of discovering multi-column foreign keys [14]. Even for single-column keys, existing work is limited and focuses mainly on identifying inclusion dependencies, since the only formal requirement for specifying a foreign key constraint is that the foreign key be a subset of the primary key [1,14]. However, checking only for inclusion can lead to a large number of false positives.…”
Section: Trade Historymentioning
confidence: 99%
“…A frequent real-world use-case of multi-column profiling is the discovery of foreign keys [96,123] with the help of inclusion dependencies [14,100]. An inclusion dependency states that all values or value combinations from one set of columns also appear in the other set of columns-a prerequisite for a foreign key.…”
Section: Dependenciesmentioning
confidence: 99%
“…The SPIDER algorithm [14] is another example, which preprocesses the data by sorting the values of each column and writing them to disk. Next, each sorted stream, corresponding to the values of one particular attribute, is consumed in parallel in a cursor-like manner, and an Ind A ⊆ B can be discarded as soon as we detect a value in A that is not present in B.…”
Section: Generating Unary Inclusion Dependenciesmentioning
confidence: 99%
See 1 more Smart Citation
“…SPIDER helps to handle this task very efficiently by testing all the attributes parallel. It can analyze 2 GB database in 20 minutes and 21 GB in 4 hours [3] Maayan Gafny et al, proposed a detection method to identify suspicious users in the database through an application. The malicious activity is detected when the result-sets are sent to the user with respect to the request that the user sent.…”
Section: Related Workmentioning
confidence: 99%