A novel approach for software defect prediction of unlabeled datasets is proposed using modified objective cluster analysis (OCA). In this approach, the first step is to construct the distance matrix of instances in the datasets by utilizing the automatically determined clusters through the modified OCA. Then the dipoles within different instances are categorized into two different groups. Finally, the clusters of instances are produced, and software defects can be predicted by imposing a modified consistency criterion. Case study and comparative experiments were conducted based on 12 public datasets selected from the databases of Promise and ReLink using multiple different unsupervised algorithms and cross‐project approaches. There are two experimental settings: experiments with datasets that contain all metrics and experiments with datasets that contain only module size metrics. The results were evaluated by the index of precision, recall, F‐measure, and receiver operating characteristic curve (AUC). Furthermore, a complexity analysis of the tested algorithms was conducted as well. In experiments with datasets with all metrics, the proposed OCA gets the best results in four indexes, and the average values of precision, recall, F‐measure, and AUC were improved by a minimum of 1.52%, 2.78%, 19.84%, and 0.93%, respectively. In experiments with datasets with only module size metrics, the proposed OCA performed the best results in four indexes also, and the average values of precision, F‐measure, and AUC were improved by a minimum of 8.8%, 2.59%, and 8.36%, respectively. The proposed algorithm is of low complexity and provides a new way to efficiently predict software defects with unlabeled datasets.
The study predicts the software defect of ranking and classification by utilizing the self-organizing data mining method. The causal relation between software metrics and defects in software modules is established. In the analysis, software metric parameters are considered as the influencing factors and independent variables; defect label values of software modules are considered as dependent variables. When ranking is predicted during the model training process, the bugs of the defect-free modules are replaced with a negative value and those of the defective modules remain unchanged. During classification predictions, the false values of the defect-free modules are replaced with a negative value, whereas the true values of the defective modules are replaced with a positive value ≥1.5. Then, case studies and comparison based on data sets of NASA, SoftLab and Promise are conducted by imposing different algorithms. The results show that in the ranking tests, the self-organizing data mining method achieves the smallest errors. In the classification tests, the F-measure values obtained in self-organizing data mining method are the most optimal among the tested algorithms. The self-organizing data mining method is high efficiency and feasible for predicting the software defects. INDEX TERMS Label function, software defect prediction, software metrics, self-organizing data mining. x 49 = 1.212z 82 −0.1291z 81 z 82 +0.04893z 2 82 +0.178 z 81 = 1.737x 45 −0.6062 z 82 = 1.076z 72 +0.1631z 71 z 72 z 71 = 0.03764x 22 −4.945 z 72 = −0.7929z 61 +1.735z 62 z 61 = 0.7773z 22 +0.05725z 21 z 22 z 21 = 0.9019z 12 +0.08939z 11 z 12 z 11 = 0.1439x 19 −0.9916 z 12 = 0.003072x 2 −0.05708x 9 +0.001641x 2 x 9 −0.3778 z 22 = −0.04946z 11 +1.15z 12 −0.06492z 11 z 12 z 11 = 0.4201x 36 −0.6082 z 12 = 0.003072x 2 −0.05708x 9 +0.001641x 2 x 9 −0.3778 z 62 = 0.5069z 51 +0.5196z 52 z 51 = 1.088z 42 +0.3919z 41 z 42 z 41 = 2.005x 43 −0.2611 z 42 = 0.9622z 32 −0.221z 31 z 32 +0.1441z 2 32 z 31 = 0.002816x 1 +0.0001271x 1 x 46 −0.4549 z 32 = 0.7712z 22 +0.1186z 21 z 22 +0.05782z 2 22 z 21 = 343.2x 38 −0.0417 z 22 = −0.04946z 11 +1.15z 12 −0.06492z 11 z 12 z 11 = 0.4201x 36 −0.6082 z 12 = 0.003072x 2 −0.05708x 9 +0.001641x 2 x 9 −0.3778 z 52 = 0.3663z 41 +0.4985z 42 +0.0544z 2 42 z 41 = 0.004447x 2 +0.1135x 6 −0.0008241x 2 6 −0.4806 z 42 = 0.9329z 32 −0.08852z 2 31 +0.0607z 2 32
Power law describes a common behavior in which a few factors play decisive roles in one thing. Most software defects occur in very few instances. In this study, we proposed a novel approach that adopts power law function characteristics for software defect prediction. The first step in this approach is to establish the power law function of the majority of metrics in a software system. Following this, the power law function’s maximal curvature value is applied as the threshold value for determining higher metric values. Furthermore, the total number of higher metric values is counted in each instance. Finally, the statistical data are clustered into different categories as defect-free and defect-prone instances. Case studies and a comparison were conducted based on twelve public datasets of Promise, SoftLab, and ReLink by using five different algorithms. The results indicate that the precision, recall, and F-measure values obtained by the proposed approach are the most optimal among the tested five algorithms, the average values of recall and F-measure were improved by 14.3% and 6.0%, respectively. Furthermore, the complexity of the proposed approach based on the power law function is O ( 2 n ) , which is the lowest among the tested five algorithms. The proposed approach is thus demonstrated to be feasible and highly efficient at software defect prediction with unlabeled datasets.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.