An Empirical Study for Enhanced Software Defect Prediction Using a Learning-Based Framework

Bashir, Kamal; Li, Tianrui; Yohannese, Chubato Wondaferaw

doi:10.2991/ijcis.2018.125905638

Cited by 10 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Class imbalance in classification models represents those situations where the number of examples of one class is much smaller than others (Bashir et al, 2018). If the model is trained on imbalanced datasets, the prediction results will be biased towards the majority class.…”

Section: Class Imbalance and Sampling Techniquesmentioning

confidence: 99%

A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

Khleel

Nehéz

2023

J Intell Inf Syst

View full text Add to dashboard Cite

Software defect prediction (SDP) plays a vital role in enhancing the quality of software projects and reducing maintenance-based risks through the ability to detect defective software components. SDP refers to using historical defect data to construct a relationship between software metrics and defects via diverse methodologies. Several prediction models, such as machine learning (ML) and deep learning (DL), have been developed and adopted to recognize software module defects, and many methodologies and frameworks have been presented. Class imbalance is one of the most challenging problems these models face in binary classification. However, When the distribution of classes is imbalanced, the accuracy may be high, but the models cannot recognize data instances in the minority class, leading to weak classifications. So far, little research has been done in the previous studies that address the problem of class imbalance in SDP. In this study, the data sampling method is introduced to address the class imbalance problem and improve the performance of ML models in SDP. The proposed approach is based on a convolutional neural network (CNN) and gated recurrent unit (GRU) combined with a synthetic minority oversampling technique plus the Tomek link (SMOTE Tomek) to predict software defects. To establish the efficiency of the proposed models, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, F-measure, Matthew’s correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR), and mean square error (MSE). The experimental results showed that the proposed models predict the software defects more effectively on the balanced datasets than the original datasets, with an improvement of up to 19% for the CNN model and 24% for the GRU model in terms of AUC. We compared our proposed approach with existing SDP approaches based on several standard performance measures. The comparison results demonstrated that the proposed approach significantly outperforms existing state-of-the-art SDP approaches on most datasets.

show abstract

Section: Class Imbalance and Sampling Techniquesmentioning

confidence: 99%

A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

Khleel

Nehéz

2023

J Intell Inf Syst

View full text Add to dashboard Cite

show abstract

“…Class imbalance in classification models represents those situations, where the number of examples of one class is much smaller than other classes. The class with the higher size of data is the majority class, while the class with a smaller size is considered as minority class [42]. Class imbalance is an important special of the software defects data, which consists of only a few defective instances and there are large number of non-defective instances.…”

Section: 4class Imbalance and Sampling Techniquesmentioning

confidence: 99%

A Novel Approach for Software Defect Prediction using CNN and GRU Based on SMOTE Tomek Method

Khleel

Nehéz

2022

Preprint

View full text Add to dashboard Cite

Software defect prediction (SDP) plays an important role in enhancing the quality of software projects and reducing maintenance-based risks through the ability to detect defective software components. SDP refers to the methods that use historical defect data to build the relationship between software metrics and software defects. Several prediction models such as machine learning (ML), deep learning (DL) have been developed and adopted to recognize defect in software modules and many methodologies and frameworks have been presented. One of the most difficult problems that these models face in binary classification is the classes imbalance. When the distribution of classes is unbalanced, the accuracy may be high, but the model cannot recognize data instances in the minority class, this will lead to weak classifications. So far, few research have been done in the previous studies that address the problem of class imbalance in SDP. To address the class imbalance problem, we propose a novel SDP approach based on convolutional neural network (CNN) and gated recurrent unit (GRU) combined with synthetic minority oversampling technique plus Tomek link (SMOTE Tomek). To establish the efficiency of the proposed models, the experiments have been conducted on benchmark datasets which obtained from the PROMISE repository and the experimental results have been compared and evaluated in terms of accuracy, precision, recall, f-measure, the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR), mean square error (MSE). The average accuracy of the proposed models on the original datasets were 89% for CNN and 87% for GRU, while the average accuracy of the proposed models on the balanced datasets were 94% for CNN and 92% for GRU. The results showed that the proposed models on the balanced datasets improves the average accuracy by 5% for both models compared to original datasets. This indicates the positive effects of combining ML techniques with data balancing methods on the performance of defect prediction regarding datasets with imbalanced class distributions.

show abstract

“…Finding potential defects among millions of code lines and thousands of documents is a difficult task for a software tester [1]. However, finding as many errors as feasible is vital to improving software quality, particularly in some critical scenarios where even a minor undiscovered software failure might have devastating effects [2,3]. Software fault prediction which is entrenched in static code features and can lead engineers to discover problem-prone modules earlier instead of random inspection in the sector has drawn increasing interest from researchers in recent years [4][5][6].…”

Section: Introductionmentioning

confidence: 99%

Software Defect Prediction Based Ensemble Approach

Harikiran¹,

Chandana²,

Srinivasarao³

et al. 2023

Computer Systems Science and Engineering

View full text Add to dashboard Cite

Software systems have grown significantly and in complexity. As a result of these qualities, preventing software faults is extremely difficult. Software defect prediction (SDP) can assist developers in finding potential bugs and reducing maintenance costs. When it comes to lowering software costs and assuring software quality, SDP plays a critical role in software development. As a result, automatically forecasting the number of errors in software modules is important, and it may assist developers in allocating limited resources more efficiently. Several methods for detecting and addressing such flaws at a low cost have been offered. These approaches, on the other hand, need to be significantly improved in terms of performance. Therefore in this paper, two deep learning (DL) models Multilayer preceptor (MLP) and deep neural network (DNN) are proposed. The proposed approaches combine the newly established Whale optimization algorithm (WOA) with the complementary Firefly algorithm (FA) to establish the emphasized metaheuristic search EMWS algorithm, which selects fewer but closely related representative features. To find the best-implemented classifier in terms of prediction achievement measurement factor, classifiers were applied to five PROMISE repository datasets. When compared to existing methods, the proposed technique for SDP outperforms, with 0.91% for the JM1 dataset, 0.98% accuracy for the KC2 dataset, 0.91% accuracy for the PC1 dataset, 0.93% accuracy for the MC2 dataset, and 0.92% accuracy for KC3.

show abstract

An Empirical Study for Enhanced Software Defect Prediction Using a Learning-Based Framework

Cited by 10 publications

References 25 publications

A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

A Novel Approach for Software Defect Prediction using CNN and GRU Based on SMOTE Tomek Method

Software Defect Prediction Based Ensemble Approach

Contact Info

Product

Resources

About