2016
DOI: 10.1109/tr.2015.2461676
|View full text |Cite
|
Sign up to set email alerts
|

Empirical Studies of a Two-Stage Data Preprocessing Approach for Software Fault Prediction

Abstract: Software fault prediction is a valuable exercise in software quality assurance to best allocate limited testing resources. Classification is one of the effective methods for software fault prediction. The classification models are trained based on the datasets obtained by mining software historical repositories. However, the performance of the models depends on the quality of datasets. In this paper, we propose a novel two-stage data preprocessing approach which incorporates both feature selection and instance… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
33
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 73 publications
(33 citation statements)
references
References 59 publications
0
33
0
Order By: Relevance
“…To deal with these issues we have realized that the performance of ELA can be increased by keeping the quality of software defect datasets, which can be done by applying either FS, resolving class imbalance problem and/or filtering noise instances [12,13,18,25,26,41,42,44,47] from defect datasets. In this regard, Liu et al [45] made a comprehensive survey of FS algorithms.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…To deal with these issues we have realized that the performance of ELA can be increased by keeping the quality of software defect datasets, which can be done by applying either FS, resolving class imbalance problem and/or filtering noise instances [12,13,18,25,26,41,42,44,47] from defect datasets. In this regard, Liu et al [45] made a comprehensive survey of FS algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…However, there is no single best approach for all situations. Liu et al [47] proposed a twostage data preprocessing approach to improve the quality of software datasets used by classification models for SFP. In the FS stage, they proposed an algorithm which involves both relevance analysis and redundancy control.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations