Frequent patterns are an important class of regularities that exist in a transaction database. Certain frequent patterns with low minimum support (minsup) value can provide useful information in many real-world applications. However, extraction of these frequent patterns with single minsupbased frequent pattern mining algorithms such as Apriori and FP-growth leads to "rare item problem." That is, at high minsup value, the frequent patterns with low minsup are missed, and at low minsup value, the number of frequent patterns explodes. In the literature, "multiple minsups framework" was proposed to discover frequent patterns. Furthermore, frequent pattern mining techniques such as Multiple Support Apriori and Conditional Frequent Pattern-growth (CFP-growth) algorithms have been proposed. As the frequent patterns mined with this framework do not satisfy downward closure property, the algorithms follow different types of pruning techniques to reduce the search space. In this paper, we propose an efficient CFP-growth algorithm by proposing new pruning techniques. Experimental results show that the proposed pruning techniques are effective.
Abstract-In this paper we have proposed an improved approach to extract rare association rules. Rare association rules are the association rules containing rare items. Rare items are less frequent items. For extracting rare itemsets, the single minimum support (minsup) based approaches like Apriori approach suffer from "rare item problem" dilemma. At high minsup value, rare itemsets are missed, and at low minsup value, the number of frequent itemsets explodes. To extract rare itemsets, an effort has been made in the literature in which minsup of each item is fixed equal to the percentage of its support. Even though this approach improves the performance over single minsup based approaches, it still suffers from "rare item problem" dilemma. If minsup for the item is fixed by setting the percentage value high, the rare itemsets are missed as the minsup for the rare items becomes close to their support, and if minsup for the item is fixed by setting the percentage value low, the number of frequent itemsets explodes. In this paper, we propose an improved approach in which minsup is fixed for each item based on the notion of "support difference". The proposed approach assigns appropriate minsup values for frequent as well as rare items based on their item supports and reduces both "rule missing" and "rule explosion" problems. Experimental results on both synthetic and real world datasets show that the proposed approach improves performance over existing approaches by minimizing the explosion of number of frequent itemsets involving frequent items and without missing the frequent itemsets involving rare items.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.