In many business contexts, the ultimate goal of knowledge discovery is not the knowledge itself, but putting it to use. Models or patterns found by data mining methods often require further post-processing to bring this about. For instance, in churn prediction, data mining may give a model that predicts which customers are likely to end their contract, but companies are not just interested in knowing who is likely to do so, they want to know what they can do to avoid this. The models or patterns have to be transformed into actionable knowledge. Action mining explicitly addresses this.Currently, many action mining methods rely on a predictive model, obtained through data mining, to estimate the effect of certain actions and finally suggest actions with desirable effects. A major problem with this approach is that predictive models do not necessarily reflect a causal relationship between their inputs and outputs. This makes the existing action mining methods less reliable. In this paper, we introduce ICE-CREAM, a novel approach to action mining that explicitly relies on an automatically obtained best estimate of the causal relationships in the data. Experiments confirm that ICE-CREAM performs much better than the current state of the art in action mining.
P. Shamsinejadbabaki et al. / Causality-based cost-effective action miningis not merely interested in predicting which customers it is going to lose, it wants to know what can be done to avoid this.Action Mining (AM) is the process of learning action rules from data. Not much work has been done in this area up till now. Existing work includes Yang et al.'s method for learning actions from decision trees [2,3], and several versions of Ras et al.'s DEAR system for discovering action rules [4][5][6]. In all of these methods, input data is in the form of a set of attribute-value pairs for each object. Furthermore, a certain profit is associated with specific values of one particular attribute, called the target attribute. These methods then try to uncover existing associations between the target attribute and other attributes, and use these associations for finding the most beneficial actions. The main difference between these methods is in the technique by which they find associations. For example, Yang's method uses decision trees, while DEAR 2 uses classification rules.Despite all innovations presented in existing AM methods, they suffer from an important drawback: they implicitly rely on the assumption that the available models (decision trees, association rules) are causal. It is well-known from statistics that association or correlation does not imply causation. Even though the learned models do not merely express the existence of a correlation, but its nature (in the form of a predictive function), they suffer from the same problem. If a function f : X → Y learned from a data set is found to be accurate, this means that, when we observe X = x in a new object, we can accurately predict that Y = f (x); but if we manually change the object's X value to X = x , the...