Characterizing research leadership on geographically weighted collaboration network

et al. 2023

Softw Pract Exp

SummaryMachine learning‐based code smell detection (CSD) has been demonstrated to be a valuable approach for improving software quality and enabling developers to identify problematic patterns in code. However, previous researches have shown that the code smell datasets commonly used to train these models are heavily imbalanced. While some recent studies have explored the use of imbalanced learning techniques for CSD, they have only evaluated a limited number of techniques and thus their conclusions about the most effective methods may be biased and inconclusive. To thoroughly evaluate the effect of imbalanced learning techniques for machine learning‐based CSD, we examine 31 imbalanced learning techniques with seven classifiers to build CSD models on four code smell data sets. We employ four evaluation metrics to assess the detection performance with the Wilcoxon signed‐rank test and Cliff's . The results show that (1) Not all imbalanced learning techniques significantly improve detection performance, but deep forest significantly outperforms the other techniques on all code smell data sets. (2) SMOTE (Synthetic Minority Over‐sampling TEchnique) is not the most effective technique for resampling code smell data sets. (3) The best‐performing imbalanced learning techniques and the top‐3 data resampling techniques have little time cost for code smell detection. Therefore, we provide some practical guidelines. First, researchers and practitioners should select the appropriate imbalanced learning techniques (e.g., deep forest) to ameliorate the class imbalance problem. In contrast, the blind application of imbalanced learning techniques could be harmful. Then, better data resampling techniques than SMOTE should be selected to preprocess the code smell data sets.

Section: Methodsmentioning

confidence: 99%

“…The metrics are widely used in both software engineering studies [64][65][66][67][68][69][70][71] and artificial intelligence researches. [72][73][74][75] In the binary classification problem, these four evaluation metrics can be calculated according to a confusion matrix, as shown in Table 4.…”

Section: Performance Measuresmentioning

confidence: 99%

On the relative value of imbalanced learning for code smell detection

Zou

et al. 2023

Softw Pract Exp

“…Therefore, it is necessary to take the effort into consideration for defect prediction. In this work, we deploy six different effort-aware evaluation metrics to measure the prediction results of EADP models, some of which are also widely used in the machine learning field [3,[36][37][38][39][40][41][42][43]. Similar to the previous EADP studies, we restrict the limited effort to 20% of the total LOC of one dataset in our work.…”

Section: Evaluation Metricmentioning

confidence: 99%

The impact of feature selection techniques on effort‐aware defect prediction: An empirical study

et al. 2023

IET Software

Effort‐Aware Defect Prediction (EADP) methods sort software modules based on the defect density and guide the testing team to inspect the modules with high defect density first. Previous studies indicated that some feature selection methods could improve the performance of Classification‐Based Defect Prediction (CBDP) models, and the Correlation‐based feature subset selection method with the Best First strategy (CorBF) performed the best. However, the practical benefits of feature selection methods on EADP performance are still unknown, and blindly employing the best‐performing CorBF method in CBDP to pre‐process the defect datasets may not improve the performance of EADP models but possibly result in performance degradation. To assess the impact of the feature selection techniques on EADP, a total of 24 feature selection methods with 10 classifiers embedded in a state‐of‐the‐art EADP model (CBS+) on the 41 PROMISE defect datasets were examined. We employ six evaluation metrics to assess the performance of EADP models comprehensively. The results show that (1) The impact of the feature selection methods varies in classifiers and datasets. (2) The four wrapper‐based feature subset selection methods with forwards search, that is, AdaBoost with Forwards Search, Deep Forest with Forwards Search, Random Forest with Forwards Search, and XGBoost with Forwards Search (XGBF) are better than other methods across the studied classifiers and the used datasets. And XGBF with XGBoost as the embedded classifier in CBS+ performs the best on the datasets. (3) The best‐performing CorBF method in CBDP does not perform well on the EADP task. (4) The selected features vary with different feature selection methods and different datasets, and the features noc (number of children), ic (inheritance coupling), cbo (coupling between object classes), and cbm (coupling between methods) are frequently selected by the four wrapper‐based feature subset selection methods with forwards search. (5) Using AdaBoost, deep forest, random forest, and XGBoost as the base classifiers embedded in CBS+ can achieve the best performance. In summary, we recommend the software testing team should employ XGBF with XGBoost as the embedded classifier in CBS+ to enhance the EADP performance.

“…When checking the top 20% LOC according to the predicted result of the EADP model, the software testing team inspects n software modules and finds p actual defective modules with q defects. In our experiments, we utilise several evaluation measures that are commonly adopted in both the software engineering [92][93][94] and machine learning [95][96][97][98][99][100]. Precision@20% is the ratio between the number of actual defective modules and the number of predicted defective modules in the top 20% LOC.…”

Section: Effort-aware Evaluation Metricsmentioning

confidence: 99%

Revisiting ‘revisiting supervised methods for effort‐aware cross‐project defect prediction’

Yang

et al. 2023

IET Software

Effort-aware cross-project defect prediction (EACPDP), which uses cross-project software modules to build a model to rank within-project software modules based on the defect density, has been suggested to allocate limited testing resource efficiently. Recently, Ni et al. proposed an EACPDP method called EASC, which used all cross-project modules to train a model without considering the data distribution difference between cross-project and within-project data. In addition, Ni et al. employed the different defect density calculation strategies when comparing EASC and baseline methods. To explore the effective defect density calculation strategies and methods on EACPDP, the authors compare four data filtering methods and five transfer learning methods with EASC using four commonly used defect density calculation strategies. The authors use three classification evaluation metrics and seven effort-aware metrics to assess the performance of methods on 11 PROMISE datasets comprehensively. The results show that (1) The classification before sorting (CBS+) defect density calculation strategy achieves the best overall performance. (2) Using balanced distribution adaption (BDA) and joint distribution adaptation (JDA) with the K-nearest neighbour classifier to build the EACPDP model can find 15% and 14.3% more defective modules and 11.6% and 8.9% more defects while achieving the acceptable initial false alarms (IFA). (3) Better comprehensive classification performance of the methods can bring better EACPDP performance to some extent. (4) A flexible adjustment of the defect threshold λ of the CBS+ strategy contribute to different goals. In summary, the authors recommend researchers and practitioners use to BDA and JDA with the CBS+ strategy to build the EACPDP model. K E Y W O R D Sdata mining, quality assurance, software engineering, software maintenance, software metrics, software quality | INTRODUCTIONComputer software is widely used in various industries in society today, which may fail with quality problems. As software plays an increasingly important role in various fields, ensuring software reliability is one of the issues people are more concerned about [1][2][3]. Software defects are a potential sources of errors, failures, and crashes of associated systems. However, the increase in the scale of the software makes defect inspection and fixing more time-consuming. Worse yet, softwareThis is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.