The impact of feature selection techniques on effort‐aware defect prediction: An empirical study

Li, Fuyang; Lu, Wanpeng; Keung, Jacky; Yu, Xiao; Gong, Lina; Li, Juan

doi:10.1049/sfw2.12099

Cited by 22 publications

(12 citation statements)

References 79 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results indicated that the feature selection technique could enhance the performance of the KNN, MLP, and LR classifiers, with the KNN classifier achieving the best performance with seven features. Li et al [48] give an investigation to discuss how the feature selection method impacts the effort-aware defect prediction task. They construct experiments on 41 PROMISE datasets with 24 feature selection methods and 10 classifiers.…”

Section: Empirical Studies Of Feature Selection Techniquesmentioning

confidence: 99%

The impact of feature selection techniques on aging-related bug prediction models: An empirical investigation

Zhang

Cotroneo

Roberto

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Software aging refers to the accumulation of error conditions over time in long-running software systems, which can lead to decreased performance and an increased likelihood of failures. Aging-Related Bug Prediction (ARBP) was introduced to predict the Aging-related Bugs (ARBs) hidden in the systems by using features extracted from source code. ARBs include memory leaks, storage problems, unreleased files, socket exceptions, unreleased file handles, disk fragmentation and so on. Previous research in Software Defect Prediction (SDP) indicated that using feature selection techniques to select a subset of representative features to use could enhance the performance of classification models. However, considering the difference between ARB features and SDP features, blindly applying the method performed well in SDP to pre-process the ARBs dataset may not necessarily improve the performance of the ARBP model, and could potentially result in a decline in performance. To address this limitation, 22 feature selection methods with 21 classifiers embedded in the most used ARBP model on four benchmark datasets from real-world software projects, and six different evaluation indicators were employed to assess the performance of ARBP models comprehensively. Our experiment results showed that: (1) The filter-based feature ranking method called SVMF performed the best on the ARBP, and the filter-based feature subset selection method ConBF performs the worst on the ARBP task. (2) Using the statistic-based classifiers as the base classification model embedded with the SVMF can perform the best, the Naive Bayes classifier always achieves the best performance. Researchers are recommended to first consider CountLineBlank, CountLineComment, and MaxCyclomaticModified features for the ARBP task.(3) The feature selection method ConBF, which performed the best in conventional SDP was not optimal for our specific task. This highlights the unique nature of aging-related features and underscores the need for a tailored feature selection method. Based on these findings, we recommend using SVMF with the Naive Bayes classifier when building ARBP models, in our study, this combination can improve the Balance performance by 18\% and Recall by 25.9% compared with no feature selection for ARBP.

show abstract

Section: Empirical Studies Of Feature Selection Techniquesmentioning

confidence: 99%

The impact of feature selection techniques on aging-related bug prediction models: An empirical investigation

Zhang

Cotroneo

Roberto

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In our empirical study, we use the three threshold-dependent evaluation metrics (Precision, Recall, and F-measure (F1)) and one threshold-independent evaluation metric (Matthews correlation coefficient, MCC) to evaluate the performance of CSD models. The metrics are widely used in both software engineering studies [64][65][66][67][68][69][70][71] and artificial intelligence researches. [72][73][74][75] In the binary classification problem, these four evaluation metrics can be calculated according to a confusion matrix, as shown in Table 4.…”

Section: Performance Measuresmentioning

confidence: 99%

On the relative value of imbalanced learning for code smell detection

Zou

Keung

et al. 2023

Softw Pract Exp

Self Cite

View full text Add to dashboard Cite

SummaryMachine learning‐based code smell detection (CSD) has been demonstrated to be a valuable approach for improving software quality and enabling developers to identify problematic patterns in code. However, previous researches have shown that the code smell datasets commonly used to train these models are heavily imbalanced. While some recent studies have explored the use of imbalanced learning techniques for CSD, they have only evaluated a limited number of techniques and thus their conclusions about the most effective methods may be biased and inconclusive. To thoroughly evaluate the effect of imbalanced learning techniques for machine learning‐based CSD, we examine 31 imbalanced learning techniques with seven classifiers to build CSD models on four code smell data sets. We employ four evaluation metrics to assess the detection performance with the Wilcoxon signed‐rank test and Cliff's . The results show that (1) Not all imbalanced learning techniques significantly improve detection performance, but deep forest significantly outperforms the other techniques on all code smell data sets. (2) SMOTE (Synthetic Minority Over‐sampling TEchnique) is not the most effective technique for resampling code smell data sets. (3) The best‐performing imbalanced learning techniques and the top‐3 data resampling techniques have little time cost for code smell detection. Therefore, we provide some practical guidelines. First, researchers and practitioners should select the appropriate imbalanced learning techniques (e.g., deep forest) to ameliorate the class imbalance problem. In contrast, the blind application of imbalanced learning techniques could be harmful. Then, better data resampling techniques than SMOTE should be selected to preprocess the code smell data sets.

show abstract

“…This helps select suitable algorithms based on software project requirements, addressing ranking instability in EADP studies. Li et al 19 investigated the impact of feature selection methods on EADP. They employed six effort-aware metrics to assess the EADP models' performance comprehensively.…”

Section: Related Workmentioning

confidence: 99%

“…Consequently, Software Defect Prediction (SDP) techniques have garnered significant attention as they enable the optimal utilization of limited resources. Software testing teams construct SDP models based on historical software data to predict the defect proneness of software modules to be inspected 3 . This allows them to allocate testing resources more effectively or prioritize the inspection of those modules that are predicted to have defects, thereby facilitating the efficient allocation of software testing resources.…”

Section: Introductionmentioning

confidence: 99%

“…While research on CPEADP is actively ongoing, there is still a need for further improvement in methods to enhance performance in cross-project scenarios. This is because the differences in data distributions in cross-project settings can significantly impact model performance 3 . Traditional machine learning algorithms assume that training and testing data come from the same data distribution, which is challenging to achieve in many practical applications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A compositional model for cross-project effort-aware defect prediction

Zhu,

Zhang,

Yin

et al. 2023

Fifth International Conference on Artificial Intelligence and Computer Science (AICS 2023)

View full text Add to dashboard Cite

The Cross-Project Effort-Aware Defect Prediction (CPEADP) model can effectively use detection resources and the data from different projects to build models. One factor affecting the performance of CPEADP is the problem of data distribution differences in cross-project settings. The classification performance also greatly impacts the predictive capabilities of the models. Therefore, we propose the BDA-DF model and conduct experiments on 11 cross-project datasets from the PROMISE repository. Compared to traditional data filtering and transfer learning methods, our approach exhibits significant improvements across five effort-aware metrics, including Precision@20%, Recall@20%, F1@20%, PofB@20%, and IFA. To explore the optimal classifier for CPEADP, we embed seven different classifiers into BDA. The experimental results on the BDA embedded with different classifiers reveal that DF exhibits the best overall performance.

show abstract

The impact of feature selection techniques on effort‐aware defect prediction: An empirical study

Cited by 22 publications

References 79 publications

The impact of feature selection techniques on aging-related bug prediction models: An empirical investigation

The impact of feature selection techniques on aging-related bug prediction models: An empirical investigation

On the relative value of imbalanced learning for code smell detection

A compositional model for cross-project effort-aware defect prediction

Contact Info

Product

Resources

About