Context: Although many papers have been published on software defect prediction techniques, machine learning approaches have yet to be fully explored. Objective: In this paper we suggest using a descriptive approach for defect prediction rather than the precise classification techniques that are usually adopted. This allows us to characterise defective modules with simple rules that can easily be applied by practitioners and deliver a practical (or engineering) approach rather than a highly accurate result. Method: We describe two well-known subgroup discovery algorithms, the SD algorithm and the CN2-SD algorithm to obtain rules that identify defect prone modules. The empirical work is performed with publicly available datasets from the Promise repository and object-oriented metrics from an Eclipse repository related to defect prediction. Subgroup discovery algorithms mitigate against characteristics of datasets that hinder the applicability of classification algorithms and so remove the need for preprocessing techniques. * Corresponding author Email addresses: daniel.rodriguezg@uah.es (Daniel Rodriguez), robertoruiz@upo.es (Roberto Ruiz), riquelme@lsi.us.es (Jose C. Riquelme), rachel.harrison@brookes.ac.uk (Rachel Harrison) Preprint submitted to IST May 5, 2013 Results: The results show that the generated rules can be used to guide testing effort in order to improve the quality of software development projects. Such rules can indicate metrics, their threshold values and relationships between metrics of defective modules.
Conclusions:The induced rules are simple to use and easy to understand as they provide a description rather than a complete classification of the whole dataset. Thus this paper represents an engineering approach to defect prediction, i.e., an approach which is useful in practice, easily understandable and can be applied by practitioners.