2021
DOI: 10.3390/s21227535
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction

Abstract: Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka. within-project) data and target project (aka. cross-project) data, which evidently degrades prediction performance. To investigate the impacts of training data selection methods on the performances of ROCPDP models,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 53 publications
0
4
0
Order By: Relevance
“…This approach does not require a labeled target dataset, which an organization may not have when building an SDP model. The representative training data selection methods were chosen based on their frequent use in SDP comparative studies [22]- [24], the granularity of the training data, and the selection strategy. Table 5 lists the methods.…”
Section: Training Data Selection Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…This approach does not require a labeled target dataset, which an organization may not have when building an SDP model. The representative training data selection methods were chosen based on their frequent use in SDP comparative studies [22]- [24], the granularity of the training data, and the selection strategy. Table 5 lists the methods.…”
Section: Training Data Selection Methodsmentioning
confidence: 99%
“…The improvement in the defect prediction performance is small [22]. Studies [23], [24] also found that the SDP model constructed from selected training data underperforms the baseline model. It encourages us to investigate the factors that may affect the efficacy of the training data selection method.…”
mentioning
confidence: 97%
See 1 more Smart Citation
“…Likewise, this approach does not require a labeled target data set, which an organization may not have when building an SDP model. The representative methods used in this study were chosen based on several factors, including their frequent use in SDP comparative studies [11], [13], [21], the granularity of the training data, and the selection strategy. Instance-level Clustering instances in the same cluster as target instance.…”
Section: Training Data Selection Methodsmentioning
confidence: 99%