Classification techniques is a popular approach to predict software defects and it involves categorizing modules, which is represented by a set of metrics or code attributes into fault prone (FP) and non-fault prone (NFP) by means of a classification model. Nevertheless, there is existence of low quality, unreliable, redundant and noisy data which negatively affect the process of observing knowledge and useful pattern. Therefore, researchers need to retrieve relevant data from huge records using feature selection methods. Feature selection is the process of identifying the most relevant attributes and removing the redundant and irrelevant attributes. In this study, the researchers investigated the effect of filter feature selection on classification techniques in software defects prediction. Ten publicly available datasets of NASA and Metric Data Program software repository were used. The topmost discriminatory attributes of the dataset were evaluated using Principal Component Analysis (PCA), CFS and FilterSubsetEval. The datasets were classified by the selected classifiers which were carefully selected based on heterogeneity. Naïve Bayes was selected from Bayes category Classifier, KNN was selected from Instance Based Learner category, J48 Decision Tree from Trees Function classifier and Multilayer perceptron was selected from the neural network classifiers. The experimental results revealed that the application of feature selection to datasets before classification in software defects prediction is better and should be encouraged and Multilayer perceptron with FilterSubsetEval had the best accuracy. It can be concluded that feature selection methods are capable of improving the performance of learning algorithms in software defects prediction.
Class imbalance is a prevalent problem in machine learning which affects the prediction performance of classification algorithms. Software Defect Prediction (SDP) is no exception to this latent problem. Solutions such as data sampling and ensemble methods have been proposed to address the class imbalance problem in SDP. This study proposes a combination of Synthetic Minority Oversampling Technique (SMOTE) and homogeneous ensemble (Bagging and Boosting) methods for predicting software defects. The proposed approach was implemented using Decision Tree (DT) and Bayesian Network (BN) as base classifiers on defects datasets acquired from NASA software corpus. The experimental results showed that the proposed approach outperformed other experimental methods. High accuracy of 86.8% and area under operating receiver characteristics curve value of 0.93% achieved by the proposed technique affirmed its ability to differentiate between the defective and non-defective labels without bias.
Failure of software systems as a result of software testing is very much rampant as modern software systems are large and complex. Software testing which is an integral part of the software development life cycle (SDLC), consumes both human and capital resources. As such, software defect prediction (SDP) mechanisms are deployed to strengthen the software testing phase in SDLC by predicting defect prone modules or components in software systems. Machine learning models are used for developing the SDP models with great successes achieved. Moreover, some studies have highlighted that a combination of machine learning models as a form of an ensemble is better than single SDP models in terms of prediction accuracy. However, the efficiency of machine learning models can change with diverse predictive evaluation metrics. Thus, more studies are needed to establish the effectiveness of ensemble SDP models over single SDP models. This study proposes the deployment of Multi-Criteria Decision Method (MCDM) techniques to rank machine learning models. Analytic Network Process (ANP) and Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE) which are types of MCDM techniques are deployed on 9 machine learning models with 11 performance evaluation metrics and 11 software defects datasets. The experimental results showed that ensemble SDP models are best appropriate SDP models as Boosted SMO and Boosted PART ranked highest for each of the MCDM techniques. Besides, the experimental results also validated the stand of not considering accuracy as the only performance evaluation metrics for SDP models. Conclusively, more performance metrics other than predictive accuracy should be considered when ranking and evaluating machine learning models. Keywords: Ensemble; Multi-Criteria Decision Method; Software Defect Prediction
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.