Ensemble learning for software fault prediction problem with imbalanced data

Khuat, Thanh Tung; Le, My Hanh

doi:10.11591/ijece.v9i4.pp3241-3246

Cited by 21 publications

(9 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using the undersampling method called "near miss," the dataset's distribution was modified. Data sampling methods aim to address class imbalance by manipulating the dataset, typically by removing the majority of class samples, to achieve a more balanced ISSN: 2502-4752  distribution [28], [29]. Among these methods, near miss is a technique used to tackle class imbalance by removing instances from the majority class.…”

Section: Class Imbalance and Data Sampling Methodsmentioning

confidence: 99%

Improving the accuracy of recurrent neural networks models in predicting software bug based on undersampling methods

Khleel,

Nehéz

2023

IJEECS

View full text Add to dashboard Cite

<span>The process of identifying software bugs is of paramount importance as it ensures software reliability and facilitates maintenance activities. The quality improvement process of software relies heavily on software bug prediction (SBP). In SBP, the task of accurately identifying defective source code poses a significant challenge. Numerous of machine learning (ML) models has been developed specifically to address this challenge in SBP. Nonetheless, the class imbalance issue restricts the potential of these models to predict software bugs accurately. This issue poses a significant hindrance to the efficiency of these models, leading to imbalanced false-positive and false-negative outcomes. Previous studies have paid limited attention to addressing the challenge of class imbalance in SBP. This study aims to fill this research gap by employing a combination of two recurrent neural networks (RNNs), namely <a name="_Hlk141433135"></a>long-short-term memory (LSTM) and gated recurrent unit (GRU), along with an undersampling method (near miss) to effectively tackle this issue. Experiments have been conducted on publicly available benchmark datasets, considering both class-level and file-level metrics. The experimental results lead to the conclusion that our models outperform others and the combination of RNNs models with undersampling methods leads to improved bug prediction performance, particularly for datasets with imbalanced class distributions.</span>

show abstract

Section: Class Imbalance and Data Sampling Methodsmentioning

confidence: 99%

Improving the accuracy of recurrent neural networks models in predicting software bug based on undersampling methods

Khleel,

Nehéz

2023

IJEECS

View full text Add to dashboard Cite

show abstract

“…Khuat and Le [11] applied the SMOTE on a diabetes mellitus dataset for balancing the class distribution of the diabetes mellitus positive and negative class. According to Seo and Kim [12], the synthetic oversampling technique is one of the most powerful techniques widely employed in medicine for imbalanced class distribution.…”

Section: Related Workmentioning

confidence: 99%

Influence of Class Imbalance and Resampling on Classification Accuracy of Chronic Kidney Disease Detection

Salau¹,

Markus²,

Assegie³

et al. 2023

MMEP

View full text Add to dashboard Cite

Chronic kidney disease is one of the leading causes of death around the world. Early detection of chronic kidney disease is crucial to the reduction of mortality caused as a result of the disease. Machine learning methods are recently becoming popular for the detection of chronic kidney disease. This study investigates the influence of resampling for chronic kidney disease detection using an imbalanced chronic kidney disease dataset. Choosing an optimal feature subset for medical datasets is important for improving the performance of data-driven predictive models. The influence of imbalanced class distribution on predictive models has become an increasingly important topic due to the recent advances in automatic decision-making processes and the continuous expansion in the volume of the data collected by medical institutions. To address the identified research gap, an experimental evaluation of synthetic minority oversampling and near miss undersampling technique was performed on a real-world chronic kidney disease dataset using several classification methods such as decision tree, random forest, K-nearest neighbor, adaptive boosting, and support vector machine. The results demonstrate that a number of variables, including performance metrics, classification algorithm, and dataset characteristics, influence the best class distribution.The study also offers useful information about resampling methods for an imbalanced classification problem which will help improve classification accuracy.

show abstract

“…In these studies, it is seen that NB (Arar & Ayan, 2017), Bayesian Network (BN; (Pandey et al, 2018), Support Vector Regression (SVR) (Kaur et al, 2017; Singh & Chaturvedi, 2013), Decision Tree (DT; Hammouri et al, 2018) and RF (Immaculate et al, 2019) algorithms have been preferred. Some ensemble learning studies used simple majority voting (Khuat & Le, 2019); Kumar et al (2017) and weighted majority voting (Moustafa et al, 2018) techniques for predicting software bugs. This article prefers NB, MLP, SVM, DT (C4.5), RF and AdaBoost algorithms as base learners for the proposed MVOC approach.…”

Section: Related Workmentioning

confidence: 99%

A novel multi‐view ordinal classification approach for software bug prediction

Taşer

2022

Expert Systems

View full text Add to dashboard Cite

Software bug prediction aims to enhance software quality and testing efficiency by constructing predictive classification models using code properties. This enables the prompt detection of fault‐prone modules. There are several machine learning‐based software bug prediction studies, which mainly focus on single view data by disregarding the natural ordering relation among the class labels in the literature. Thus, these studies cause losing each view's own intrinsic structure and the inherent order of the labels that positively affect the prediction performance. To overcome this drawback, this study focuses on integrating ordering information and a multi‐view learning strategy. This paper proposes a novel approach multi‐view ordinal classification (MVOC), which learns from different views (complexity, coupling, cohesion, inheritance and scale) of the software dataset separately and predicts software bugs taking the inherent order of class labels (non‐buggy, less buggy and more buggy) into consideration. To demonstrate its prediction performance, the MVOC approach was executed on the 40 different real‐world software datasets using six different classification algorithms as base learners. In the experiments, the MVOC approach was compared with traditional classifiers and their multi‐view implementations in terms of precision, recall, f‐measure and accuracy rate metrics. The results indicate that the MVOC approach presents better prediction performance on average than the multi‐view‐based and traditional classifiers. It is also observed from the results that the MVOC.RF model achieved the highest classification performance with an average accuracy rate of 85.65%.

show abstract

Ensemble learning for software fault prediction problem with imbalanced data

Cited by 21 publications

References 24 publications

Improving the accuracy of recurrent neural networks models in predicting software bug based on undersampling methods

Improving the accuracy of recurrent neural networks models in predicting software bug based on undersampling methods

Influence of Class Imbalance and Resampling on Classification Accuracy of Chronic Kidney Disease Detection

A novel multi‐view ordinal classification approach for software bug prediction

Contact Info

Product

Resources

About