Unsupervised methods for Software Defect Prediction

Ha, Duy-An; Chen, Ting‐Hsuan; Yuan, Shyan-Ming

doi:10.1145/3368926.3369711

Cited by 6 publications

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Exploring the impact of data preprocessing techniques on composite classifier algorithms in cross-project defect prediction

Vescan,

Găceanu,

Şerban

2024

Autom Softw Eng

View full text Add to dashboard Cite

Success in software projects is now an important challenge. The main focus of the engineering community is to predict software defects based on the history of classes and other code elements. However, these software defect prediction techniques are effective only as long as there is enough data to train the prediction model. To mitigate this problem, cross-project defect prediction is used. The purpose of this research investigation is twofold: first, to replicate the experiments in the original paper proposal, and second, to investigate other settings regarding defect prediction with the aim of providing new insights and results regarding the best approach. In this study, three composite algorithms, namely AvgVoting, MaxVoting and Bagging are used. These algorithms integrate multiple machine classifiers to improve cross-project defect prediction. The experiments use pre-processed methods (normalization and standardization) and also feature selection. The results of the replicated experiments confirm the original findings when using raw data for all three methods. When normalization is applied, better results than in the original paper are obtained. Even better results are obtained when feature selection is used. In the original paper, the MaxVoting approach shows the best performance in terms of the F-measure, and BaggingJ48 shows the best performance in terms of cost-effectiveness. The same results in terms of F-measure were obtained in the current experiments: best MaxVoting, followed by AvgVoting and then by BaggingJ48. Our results emphasize the previously obtained outcome; the original study is confirmed when using raw data. Moreover, we obtained better results when using preprocessing and feature selection.

show abstract

Exploring the impact of data preprocessing techniques on composite classifier algorithms in cross-project defect prediction

Vescan,

Găceanu,

Şerban

2024

Autom Softw Eng

View full text Add to dashboard Cite

show abstract

On the relative value of clustering techniques for Unsupervised Effort-Aware Defect Prediction

Yang,

Zhu,

Zhang

et al. 2024

Expert Systems with Applications

View full text Add to dashboard Cite

Software Defect Prediction Method Based on Clustering Ensemble Learning

Tao,

Cao,

Chen

et al. 2024

IET Software

View full text Add to dashboard Cite

The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large‐scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi‐squared sparse feature selection method is proposed. This feature selection strategy combines chi‐squared tests with sparse principal component analysis (SPCA). Specifically, the chi‐squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset–Newman–Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi‐squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.

show abstract

Unsupervised methods for Software Defect Prediction

Cited by 6 publications

References 10 publications

Exploring the impact of data preprocessing techniques on composite classifier algorithms in cross-project defect prediction

Exploring the impact of data preprocessing techniques on composite classifier algorithms in cross-project defect prediction

On the relative value of clustering techniques for Unsupervised Effort-Aware Defect Prediction

Software Defect Prediction Method Based on Clustering Ensemble Learning

Contact Info

Product

Resources

About