Software defect prediction (SDP) is the technique used to predict the occurrences of defects in the early stages of software development process. Early prediction of defects will reduce the overall cost of software and also increase its reliability. Most of the defect prediction methods proposed in the literature suffer from the class imbalance problem. In this paper, a novel class imbalance reduction (CIR) algorithm is proposed to create a symmetry between the defect and non-defect records in the imbalance datasets by considering distribution properties of the datasets and is compared with SMOTE (synthetic minority oversampling technique), a built-in package of many machine learning tools that is considered a benchmark in handling class imbalance problems, and with K-Means SMOTE. We conducted the experiment on forty open source software defect datasets from PRedict or Models in Software Engineering (PROMISE) repository using eight different classifiers and evaluated with six performance measures. The results show that the proposed CIR method shows improved performance over SMOTE and K-Means SMOTE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.