Igor Ibarguren scite author profile

In this short paper, we compare well-known rule/tree classifiers in software defect prediction with the CTC decision tree classifier designed to deal with class imbalance. It is well-known that most software defect prediction datasets are highly imbalance (non-defective instances outnumber defective ones). In this work, we focused only on tree/rule classifiers as these are capable of explaining the decision, i.e., describing the metrics and thresholds that make a module error prone. Furthermore, rules/decision trees provide the advantage that they are easily understood and applied by project managers and quality assurance personnel. The CTC algorithm was designed to cope with class imbalance and noise datasets instead of using preprocessing techniques (oversampling or undersampling), ensembles or cost weights of misclassification. The experimental work was carried out using the NASA datasets and results showed that induced CTC decision trees performed better or similar to the rest of the rule/tree classifiers.

show abstract

CT <DT>: Extending the application of the consolidation methodology even further

Ibarguren

Pérez

Muguerza

et al. 2017

Expert Systems

View full text Add to dashboard Cite

The consolidation process, originally applied to the C4.5 tree induction algorithm, improved its discriminating capacity and stability. Consolidation creates multiple samples and builds a simple (nonmultiple) classifier by applying the ensemble process during the model construction phase. The work presented in this paper aims to show the consolidation process can improve algorithms other than C4.5 by applying the consolidation process to three tree induction algorithms: a variant of the chi‐squared automatic interaction detector (CHAID*), C4.4, and CHAIC (also a contribution of this paper). The consolidation of CHAID* and CHAIC, required solving the handicap of consolidating the value groupings proposed by each CHAID* or CHAIC tree for discrete attributes. The experimentation is divided in 3 classification contexts for 96 datasets. Results show that consolidated algorithms perform robustly, ranking competitively in all contexts, never falling into lower positions unlike most of the other rule‐induction algorithms considered in the study. When performing a global comparison consolidated algorithms rank in top positions.

show abstract

PCTBagging: From inner ensembles to ensembles. A trade-off between discriminating capacity and interpretability

Ibarguren

Pérez

Muguerza

et al. 2022

Information Sciences

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Igor Ibarguren

Coverage-based resampling: Building robust consolidated decision trees

BFPART: Best-First PART

The Consolidated Tree Construction algorithm in imbalanced defect prediction datasets

CT <DT>: Extending the application of the consolidation methodology even further

PCTBagging: From inner ensembles to ensembles. A trade-off between discriminating capacity and interpretability

Contact Info

Product

Resources

About