Most of the existing association rule mining algorithms are able to extract knowledge from databases with attributes of binary values. However, in real-world applications, databases are usually composed of continuous values such as height, length or weight. If the attributes are continuous, the algorithms are commonly integrated with a discretization method that transforms them into discrete attributes. Discretization is a process of transforming a continuous attribute value into a finite number of intervals and assigning each interval into a discrete numerical value. However, the user most often must specify the number of intervals, or provide some heuristic rules to be used while discretization, and then it is difficult to get the highest attribute interdependency and at the same time get the lowest number of intervals. In this paper we present an association rule mining algorithm that is suited for continuous valued attributes commonly found in scientific and statistical databases. We propose a method using a new graph-based evolutionary algorithm named 'genetic network programming (GNP)' that can deal with continuous values directly, that is, without using any discretization method as a preprocessing step. GNP represents its individuals using graph structures and evolves them in order to find a solution; this feature contributes to creating very compact programs and implicitly memorizing past action sequences. In the proposed method using GNP, the significance of the extracted association rules is measured by the use of χ 2 test, and only important association rules are stored in a pool all together through generations. Results of experiments conducted on a real-life database suggest that the proposed method provides an effective technique for handling continuous attributes.
Among several methods of extracting association rules that have been reported, a new evolutionary method named Genetic Network Programming (GNP) has also shown its effectiveness for small databases in the sense that they have a relatively small number of attributes. However, this conventional GNP method is not be able to deal with large databases with a huge number of attributes, because its search space becomes very large, causing bad performance at running time. The aim of this paper is to propose a new method to extract association rules from large and dense databases with a huge amount of attributes through the combination of conventional GNP based mining method and a specially designed genetic algorithm (GA). Each of these evolutionary methods works in its own processing level and they are highly synchronized to act as one system.Our strategy consists in the division of a large and dense database into many small databases. These small databases are considered as individuals and form a population. Then the conventional GNP based mining method is applied to extract association rules for each of these individuals. Finally, the population is evolved through several generations using GA with special genetic operators considering the acquired information. Two complementary processing levels are defined: Global Level and Local Level, each with its own independent tasks and processes. In the Global Level mainly GA process is carried out, whereas in the Local Level, conventional GNP based mining method is carried out in parallel and they generate their own local pools of association rules. Several special genetic operations for GA in the Global Level are proposed and the performance of each of them and their combination is shown and compared.In our simulations, the conventional GNP based mining method and our proposed method are compared using a real world large and dense database with a huge amount of attributes. The results show that extending the conventional GNP based mining method using GA allows to extract association rules from large and dense databases directly and more efficiently than the conventional GNP method.
The initiative of combining association rule mining with fuzzy set theory has been applied frequently in recent years [1][2][3][4][5]. The original idea comes from dealing with quantitative attributes in a database, where discretization of the quantitative values into intervals would lead to under or overestimation of the values that are near the borders. This is called the sharp boundary problem. Fuzzy sets can help us to overcome this problem by allowing different degrees of the membership, not only 1 and 0 treated by traditional methods. Attribute values can thereby be the members of more than one set and therefore give a more realistic view on such data. On the other hand, fuzzy set theory has been shown to be a very useful tool in association rule mining, because the mined rules can be expressed in linguistic terms, which are more natural and understandable for human beings. The linguistic representation is mainly useful when those discovered rules are presented to human experts for study. In this paper, a novel association rule mining approach that integrates the evolutionary optimization technique 'genetic network programming (GNP)' and fuzzy set theory has been proposed for mining interesting fuzzy rules from given quantitative data. The performance of our algorithm has been compared with other relevant algorithms and the experimental results show the advantages and effectiveness of the proposed model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.