2018
DOI: 10.1155/2018/2650415
|View full text |Cite
|
Sign up to set email alerts
|

An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

Abstract: Cross-project defect prediction (CPDP) on projects with limited historical data has attracted much attention. To the best of our knowledge, however, the performance of existing approaches is usually poor, because of low quality cross-project training data. The objective of this study is to propose an improved method for CPDP by simplifying training data, labeled as TDSelector, which considers both the similarity and the number of defects that each training instance has (denoted by defects), and to demonstrate … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(19 citation statements)
references
References 44 publications
0
19
0
Order By: Relevance
“…Peng He et al [19] developed TD selector using defects and similarity as a weighted function. He utilized logistic regression as a classifier model and analyzed the effects of several combinations of normalization and similarity of defects on the performance of prediction.…”
Section: Literature Surveymentioning
confidence: 99%
“…Peng He et al [19] developed TD selector using defects and similarity as a weighted function. He utilized logistic regression as a classifier model and analyzed the effects of several combinations of normalization and similarity of defects on the performance of prediction.…”
Section: Literature Surveymentioning
confidence: 99%
“…Therefore, we drop the duplicate instances to keep them unique in source data. Different software metrics are usually with different magnitude and several studies 49,50 indicated that simple normalization can improve prediction performance. Therefore, we use Z‐score normalization 56 to scale each metric of the unique source data and the target data to have mean 0 and standard deviation 1 as the previous papers 19,60 did.…”
Section: Research Approach: Wiflfmentioning
confidence: 99%
“…Finally, Logistic Regression was used for prediction. He et al [ 24 ] simplified the training set by TDSelector method and then classified it by Logistic Regression. Sun et al [ 25 ] proposed a near-some source project selection by collaborative filtering (CFPS) method to filter source items, which has good results using SMO and Random Forest as classifiers.…”
Section: Related Workmentioning
confidence: 99%