In this short paper, we compare well-known rule/tree classifiers in software defect prediction with the CTC decision tree classifier designed to deal with class imbalance. It is well-known that most software defect prediction datasets are highly imbalance (non-defective instances outnumber defective ones). In this work, we focused only on tree/rule classifiers as these are capable of explaining the decision, i.e., describing the metrics and thresholds that make a module error prone. Furthermore, rules/decision trees provide the advantage that they are easily understood and applied by project managers and quality assurance personnel. The CTC algorithm was designed to cope with class imbalance and noise datasets instead of using preprocessing techniques (oversampling or undersampling), ensembles or cost weights of misclassification. The experimental work was carried out using the NASA datasets and results showed that induced CTC decision trees performed better or similar to the rest of the rule/tree classifiers.
The consolidation process, originally applied to the C4.5 tree induction algorithm, improved its discriminating capacity and stability. Consolidation creates multiple samples and builds a simple (nonmultiple) classifier by applying the ensemble process during the model construction phase. The work presented in this paper aims to show the consolidation process can improve algorithms other than C4.5 by applying the consolidation process to three tree induction algorithms: a variant of the chi‐squared automatic interaction detector (CHAID*), C4.4, and CHAIC (also a contribution of this paper). The consolidation of CHAID* and CHAIC, required solving the handicap of consolidating the value groupings proposed by each CHAID* or CHAIC tree for discrete attributes. The experimentation is divided in 3 classification contexts for 96 datasets. Results show that consolidated algorithms perform robustly, ranking competitively in all contexts, never falling into lower positions unlike most of the other rule‐induction algorithms considered in the study. When performing a global comparison consolidated algorithms rank in top positions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.