Abstract. Correlated pattern mining has become increasingly an important task in data mining and knowledge discovery. Recently, concise exact representations dedicated for frequent correlated and for rare correlated patterns according to the Jaccard measure were presented. In this paper, we offer a new method of inferring new knowledge from the introduced concise representations. A new generic approach, called GMJP, allowing the extraction of the sets of frequent correlated patterns, of rare correlated patterns and their associated concise representations is introduced. Pieces of new knowledge in the form of associations rules can be either exact or approximate. We also illustrate the efficiency of our approach over several data sets and we prove that Jaccard-based classification rules have very encouraging results.
Correlated pattern mining has become increasingly an important task in data mining and knowledge discovery. In practice, the exploitation of correlated patterns is hampered by the high number of the generated patterns. Thus, the integration of the constraint of frequency with the constraint of correlation has been proved to be very interesting by mining Frequent correlated patterns [2,14] and Rare correlated patterns [4,3]. In this situation, the main task concerns the manipulation of the constraints of correlation and of frequency. One way to deal with this issue is to mine all the correlated patterns and then to filter by the constraint of frequency. However, this filtering is done as a post-processing phase and it suffers from the important number of patterns and loses the opportunity to exploit the selectivity power of both constraints.In this paper, we introduce an approach that puts the focus on mining rare correlated patterns according to the bond measure. We were based on the simultaneous integration of the anti-monotone constraint of correlation and the monotone constraint of rarity during the mining process. Our experimental studies shows an important benefit when early pushing the constraints of distinct types. We also flag out better performances than the Gmjp approach [3] which also dealt with both types of constraints.
During the last years, many works focused on the exploitation and the extraction of rare patterns. In fact, these patterns allow conveying knowledge on rare and unexpected events. They are hence useful in several application fields. Nevertheless, a main moan addressed to rare pattern extraction approaches is, on the one hand, their high number and, on the other hand, the low quality of several mined patterns. The latter can indeed not present strong correlations between the items they contain. In order to overcome these limits, we propose to integrate the correlation measure bond aiming at only mining the set of rare patterns fulfilling this measure. A characterization of the resulting set, of rare correlated patterns, is then carried out based on the study of constraints of distinct types induced by the rarity and the correlation. In addition, based on the equivalence classes associated to a closure operator dedicated to the bond measure, we introduce new concise representations of rare correlated patterns as well as the derivation process of the generic bases of the rare correlated association rules. We then design the RCPRMINER algorithm allowing an efficient extraction of the proposed concise representations. Carried out experimental studies highlight the very encouraging compactness rates offered by the proposed concise representations and show the good performance of the RCPRMINER algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.