Huge amounts of data are being collected and analyzed nowadays. By using the
popular rule-learning algorithms, the number of rule discovered on those
?big? datasets can easily exceed thousands. To produce compact,
understandable and accurate classifiers, such rules have to be grouped and
pruned, so that only a reasonable number of them are presented to the end
user for inspection and further analysis. In this paper, we propose new
methods that are able to reduce the number of class association rules
produced by ?classical? class association rule classifiers, while
maintaining an accurate classification model that is comparable to the ones
generated by state-of-the-art classification algorithms. More precisely, we
propose new associative classifiers, called DC, DDC and CDC, that use
distance-based agglomerative hierarchical clustering as a post-processing
step to reduce the number of its rules, and in the rule-selection step, we
use different strategies (based on database coverage and cluster center) for
each algorithm. Experimental results performed on selected datasets from the
UCI ML repository show that our classifiers are able to learn classifiers
containing significantly fewer rules than state-of-the-art rule learning
algorithms on datasets with a larger number of examples. On the other hand,
the classification accuracy of the proposed classifiers is not significantly
different from state-of-the-art rule-learners on most of the datasets.