Data Mining (DM) represents the process of extracting interesting and previously unknown knowledge from data. This study proposes a new algorithm called FD_Discover for discovering Functional Dependencies (FDs) from databases. This algorithm employs some concepts from relational databases design theory specifically the concepts of equivalences and the minimal cover. It has resulted in large improvement in performance in comparison with a recent and similar algorithm called FD_MINE. Key words:Data mining, functional dependencies, equivalent classes, minimal cover INRODUCTIONFDs are relationships (constraints) between the attributes of a database relation; a FD states that the value of some attributes are uniquely determined by the values of some other attributes. Discovering FDs from databases is useful for reverse engineering of legacy systems for which the design information has been lost. Furthermore, discovering FDs can also help a database designer to decompose a relational schema into several relations through the normalization process to get rid or eliminate some of the problems of unsatisfactory database design. The identification of these dependencies that are satisfied by a database instance is an important topic in data mining literature [5] . The problem of generating or discovering FDs from relational databases had been studied in [3,4,7,8,11,12] . A straight forward solution algorithm is shown to require exponential time of all inputs (number of attributes and number of tuples in a relation).In this study, we are proposing a new algorithm for discovering FDs from static databases. This algorithm will use and employ some concepts from relational database theory, such as the theory of equivalencies and minimal cover of FDs. The proposed algorithm aims at minimizing the time requirements of algorithms that discover FDs from databases. We will compare the result of our proposed algorithm with a previous well known algorithm called FD_MINE [12] . Some of the previous studies in discovering FDs from databases presented in [2,4,8] have focused on discovering embedded parallel query execution, optimizing queries, providing some kind of summaries over large data sets or discovering association rules in stream data.Other studies presented in [3,7,9,11,12] have presented various methods for discovering FDs from large databases, these studies have focused on discovering FDs from very large databases and had faced a general problem which is represented by the exponential time requirements that depend on database size (the dimensionality problem in number of tuples and attributes). Definition 1:If A is an attribute or set of attributes in relation R, all the attributes in R that are functionally dependent on A in relation R with respect to F (where F is a set of FDs that holds on R), form the closure of A and it is denoted by A + or Closure(A).Definition 2: The nontrivial closure of attribute A with respect to F that is denoted by Closure→(A), is defined as Closure→(A) = Closure (A)-A.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.