Software defect prediction (SDP) is an effective technique to lower software module testing costs. However, the imbalanced distribution almost exists in all SDP datasets and restricts the accuracy of defect prediction. In order to balance the data distribution reasonably, we propose a novel resampling method LIMCR on the basis of Naïve Bayes to optimize and improve the SDP performance. The main idea of LIMCR is to remove less-informative majorities for rebalancing the data distribution after evaluating the degree of being informative for every sample from the majority class. We employ 29 SDP datasets from the PROMISE and NASA dataset and divide them into two parts, the small sample size (the amount of data is smaller than 1100) and the large sample size (larger than 1100). Then we conduct experiments by comparing the matching of classifiers and imbalance learning methods on small datasets and large datasets, respectively. The results show the effectiveness of LIMCR, and LIMCR+GNB performs better than other methods on small datasets while not brilliant on large datasets.
Context: There is considerable diversity in the range and design of computational experiments to assess classifiers for software defect prediction. This is particularly so, regarding the choice of classifier performance metrics. Unfortunately some widely used metrics are known to be biased, in particular F 1 . Objective: We want to understand the extent to which the widespread use of the F 1 renders empirical results in software defect prediction unreliable. Method: We searched for defect prediction studies that report both F 1 and the Matthews correlation coefficient (MCC). This enabled us to determine the proportion of results that are consistent between both metrics and the proportion that change. Results: Our systematic review identifies 8 studies comprising 4017 pairwise results. Of these results, the direction of the comparison changes in 23% of the cases when the unbiased MCC metric is employed.
Conclusion:We find compelling reasons why the choice of classification performance metric matters, specifically the biased and misleading F 1 metric should be deprecated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.