We tackle the novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes C + and C − and a query object o, we want to find the top-k subspaces that maximize the ratio of likelihood of o in C + against that in C − . Such subspaces are very useful for characterizing an object and explaining how it differs between two classes. We demonstrate that this problem has important applications, and, at the same time, is very challenging, being MAX SNP-hard. We present CSMiner, a mining method that uses kernel density estimation in conjunction with various pruning techniques.
B Lei DuanWe experimentally investigate the performance of CSMiner on a range of data sets, evaluating its efficiency, effectiveness, and stability and demonstrating it is substantially faster than a baseline method.