Just-in-Time Software Defect Prediction (JIT-SDP) is an SDP approach that makes defect predictions at the software change level. Most existing JIT-SDP work assumes that the characteristics of the problem remain the same over time. However, JIT-SDP may suffer from class imbalance evolution. Specifically, the imbalance status of the problem (i.e., how much underrepresented the defect-inducing changes are) may be intensified or reduced over time. If occurring, this could render existing JIT-SDP approaches unsuitable, including those that rebuild classifiers over time using only recent data. This work thus provides the first investigation of whether class imbalance evolution poses a threat to JIT-SDP. This investigation is performed in a realistic scenario by taking into account verification latency-the often overlooked fact that labeled training examples arrive with a delay. Based on 10 GitHub projects, we show that JIT-SDP suffers from class imbalance evolution, significantly hindering the predictive performance of existing JIT-SDP approaches. Compared to state-of-the-art class imbalance evolution learning approaches, the predictive performance of JIT-SDP approaches was up to 97.2% lower in terms of g-mean. Hence, it is essential to tackle class imbalance evolution in JIT-SDP. We then propose a novel class imbalance evolution approach for the specific context of JIT-SDP. While maintaining top ranked g-means, this approach managed to produce up to 63.59% more balanced recalls on the defect-inducing and clean classes than state-of-theart class imbalance evolution approaches. We thus recommend it to avoid overemphasizing one class over the other in JIT-SDP.
One-class classification is an important problem with applications in several different areas such as novelty detection, anomaly detection, outlier detection and machine monitoring. In this paper, we propose two novel methods for one-class classification, referred to as NNDDSRM and kNNDDSRM. The methods are based on the principle of structural risk minimization and the nearest neighbor data description (NNDD) one-class classifier. Experiments carried out using both artificial and real-world datasets show that the proposed methods are able to significantly reduce the number of stored prototypes in comparison to NNDD. The experimental results also show that the proposed methods outperformed NNDD-in terms of the area under the receiver operating characteristic (ROC) curve-on four of the five datasets considered in the experiments and had a similar performance on the remaining one.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.