Efficiently Detecting Inclusion Dependencies

Bauckmann, Jana; Leser, Ulf; Naumann, Felix; Tietz, Vincent

doi:10.1109/icde.2007.369032

Cited by 33 publications

(33 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Surprisingly, little previous work deals with the case of discovering multi-column foreign keys [14]. Even for single-column keys, existing work is limited and focuses mainly on identifying inclusion dependencies, since the only formal requirement for specifying a foreign key constraint is that the foreign key be a subset of the primary key [1,14]. However, checking only for inclusion can lead to a large number of false positives.…”

Section: Trade Historymentioning

confidence: 99%

On multi-column foreign key discovery

et al. 2010

View full text Add to dashboard Cite

A foreign/primary key relationship between relational tables is one of the most important constraints in a database. From a data analysis perspective, discovering foreign keys is a crucial step in understanding and working with the data. Nevertheless, more often than not, foreign key constraints are not specified in the data, for various reasons; e.g., some associations are not known to designers but are inherent in the data, while others become invalid due to data inconsistencies. This work proposes a robust algorithm for discovering single-column and multi-column foreign keys. Previous work concentrated mostly on discovering single-column foreign keys using a variety of rules, like inclusion dependencies, column names, and minimum/maximum values. We first propose a general rule, termed Randomness, that subsumes a variety of other rules. We then develop efficient approximation algorithms for evaluating randomness, using only two passes over the data. Finally, we validate our approach via extensive experiments using real and synthetic datasets.

show abstract

Section: Trade Historymentioning

confidence: 99%

On multi-column foreign key discovery

et al. 2010

View full text Add to dashboard Cite

show abstract

“…A frequent real-world use-case of multi-column profiling is the discovery of foreign keys [96,123] with the help of inclusion dependencies [14,100]. An inclusion dependency states that all values or value combinations from one set of columns also appear in the other set of columns-a prerequisite for a foreign key.…”

Section: Dependenciesmentioning

confidence: 99%

“…The SPIDER algorithm [14] is another example, which preprocesses the data by sorting the values of each column and writing them to disk. Next, each sorted stream, corresponding to the values of one particular attribute, is consumed in parallel in a cursor-like manner, and an Ind A ⊆ B can be discarded as soon as we detect a value in A that is not present in B.…”

Section: Generating Unary Inclusion Dependenciesmentioning

confidence: 99%

“…If the distinct value sets of columns A and B are not available, we can estimate the Jaccard similarity using their MinHash signatures [38]. Conditional functional dependencies [24], [59], CTANE [47], CFUN [42], FACD [91], FastCFD [47] Inclusion dependencies [101], [87], SPIDER [14], ZigZag [102] Conditional inclusion dependencies [61], CINDERELLA [13], PLI [13] Foreign keys [123], [143] Denial constraints FastDC [29] Differential dependencies [128] Sequential dependencies [57] 5 Dependency detection…”

Section: Summaries and Sketchesmentioning

confidence: 99%

See 1 more Smart Citation

Profiling relational data: a survey

Abedjan¹,

2015

Self Cite

View full text Add to dashboard Cite

Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases.

show abstract

“…SPIDER helps to handle this task very efficiently by testing all the attributes parallel. It can analyze 2 GB database in 20 minutes and 21 GB in 4 hours [3] Maayan Gafny et al, proposed a detection method to identify suspicious users in the database through an application. The malicious activity is detected when the result-sets are sent to the user with respect to the request that the user sent.…”

Section: Related Workmentioning

confidence: 99%

Policy Based Detection during Emergency and Sharing Secure Information

K.Akshaya¹,

Priya²,

Ashwini³

et al. 2017

IJECS

View full text Add to dashboard Cite

Emergency scenarios are very common in healthcare domain. These situations are very unpredictable and difficult to estimate the loss and amount of injuries or diseases that could occur. Hence it is impossible for the automated systems to detect these emergency situations and cannot provide the new information's. In case of elderly persons and patients who reside at home, need to check their healthcare record periodically using the systems. At times this becomes time consuming and inefficient to use and it could also happen that the person may forget to record or monitor the values periodically. Another scenario to be considered is when the patient is alone and not able to communicate the emergency situation to the family members and doctors. Hence to help and identify this situation we propose a flexible access control framework using Complex Event Processing (CEP) technology. When an emergency is detected the temporary access control policies (tacps) will be activated. These control policies will override the regular policies in emergency cases.

show abstract

Efficiently Detecting Inclusion Dependencies

Cited by 33 publications

References 7 publications

On multi-column foreign key discovery

On multi-column foreign key discovery

Profiling relational data: a survey

Policy Based Detection during Emergency and Sharing Secure Information

Contact Info

Product

Resources

About