On the relative value of clustering techniques for Unsupervised Effort-Aware Defect Prediction

Yang, Peixin; Zhu, Lin; Zhang, Yanjiao; Ma, Chuanxiang; Liu, Liming; Yu, Xiao; Hu, Wenhua

doi:10.1016/j.eswa.2023.123041

Cited by 5 publications

References 77 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Wang,

Lu,

Yang

et al. 2024

Int J Comput Intell Syst

View full text Add to dashboard Cite

Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA)3, a novel framework leveraging the pre-trained CodeT5+ and (IA)3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA)3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA)3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA)3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at (https://gitee.com/wxyzjp123/msdd-ia3/).

show abstract

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Wang,

Lu,

Yang

et al. 2024

Int J Comput Intell Syst

View full text Add to dashboard Cite

show abstract

PMTT: Parallel multi-scale temporal convolution network and transformer for predicting the time to aging failure of software systems

Jia,

Yu,

Zhang

et al. 2024

Journal of Systems and Software

View full text Add to dashboard Cite

Software Defect Prediction Method Based on Clustering Ensemble Learning

Tao,

Cao,

Chen

et al. 2024

IET Software

View full text Add to dashboard Cite

The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large‐scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi‐squared sparse feature selection method is proposed. This feature selection strategy combines chi‐squared tests with sparse principal component analysis (SPCA). Specifically, the chi‐squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset–Newman–Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi‐squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.

show abstract

On the relative value of clustering techniques for Unsupervised Effort-Aware Defect Prediction

Cited by 5 publications

References 77 publications

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

PMTT: Parallel multi-scale temporal convolution network and transformer for predicting the time to aging failure of software systems

Software Defect Prediction Method Based on Clustering Ensemble Learning

Contact Info

Product

Resources

About