Abstract. Generalized association rules are a very important extension of boolean association rules, but with current approaches mining generalized rules is computationally very expensive. Especially when considering the rule generation as being part of an interactive KDD-process this becomes annoying. In this paper we discuss strengths and weaknesses of known approaches to generate frequent itemsets. Based on the insights we derive a new algorithm, called Prutax, to mine generalized frequent itemsets. The basic ideas of the algorithm and further optimisation are described. Experiments with both synthetic and real-life data show that Prutax is an order of magnitude faster than previous approaches.
Abstract. Knowledge Discovery in Databases KDD is currently a hot topic in industry and academia. Although KDD is now widely accepted as a complex process of many di erent phases, the focus of research b ehind most emerging products is on underlying algorithms and modelling techniques. The main bottleneck for KDD applications is not the lack o f techniques. The challenge is to exploit and combine existing algorithms e ectively, and help the user during all phases of the KDD process. In this paper, we describe the project Citrus which addresses these practically relevant issues. Starting from a commercially a v ailable system, we develop a scaleable, extensible tool inherently based on the view of KDD as an interactive and iterative process. We s k etch the main components of this system, namely an information manager for e ective retrieval of data and results, an execution server for e cient execution, and a process support interface for guiding the user through the process.
The main contribution of this paper is a two step method for inventing new predica tes which overcomes sorne of the shortcomings of previously published methods (implemented in Cigol & LFP2). The method integrates abductive and inductive learning. In the first step, proofs of the training instances are completed by assuming new facts built from a new predicate symbol. In the second step, the general clause derived in order to explain the training instances is used to generate more instances of the newly invented predicate. These instances are then used to induce a general definition of the new predicate.
Direct marketing is an increasingly popular application of data mining. In this paper we summarize some of our own experiences from various data mining application projects for direct marketing. We focus on a particular project environment and describe tools which address issues across the whole data mining process. These tools include a Quick Reference Guide for the standardization of the process and for user guidance and a library of re-usable procedures in the commercial data mining tool Clementine. We report experiences with these tools and identify open issues requiring further research. In particular, we focus on evaluation measures for predictive models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.