2017
DOI: 10.1109/tse.2016.2584050
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

6
286
2
2

Year Published

2018
2018
2021
2021

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 453 publications
(296 citation statements)
references
References 96 publications
6
286
2
2
Order By: Relevance
“…Still in this category there is a possible threat related to the validation methodology exploited. As shown by Tantithamthavorn et al [114], ten-fold cross validation might provide unstable results because of the effect of random splitting. To deal with this issue, we repeated the 10-fold cross validation 100 times: in this way, we drastically removed the bias due to the validation strategy.…”
Section: Threats To Conclusion Validitymentioning
confidence: 99%
“…Still in this category there is a possible threat related to the validation methodology exploited. As shown by Tantithamthavorn et al [114], ten-fold cross validation might provide unstable results because of the effect of random splitting. To deal with this issue, we repeated the 10-fold cross validation 100 times: in this way, we drastically removed the bias due to the validation strategy.…”
Section: Threats To Conclusion Validitymentioning
confidence: 99%
“…Table III reports the parameters we tuned for to each classifier. For tuning, we followed a GridSearch approach [31] with tuneLength = 5-i.e., the maximum number of different values to be evaluated for each parameter [32], [33].…”
Section: Classification Settingsmentioning
confidence: 99%
“…To compare the performance of the best-answer prediction models, as suggested by Tantithamthavorn et al (2017), we use the Scott-Knott ESD test, which groups the models into statistically distinct clusters with a non-negligible difference, at level α = 0.01. The grouping is performed based on mean AUC values (i.e., the mean AUC value of the ten × 10-fold runs for each prediction model).…”
Section: Best-answer Prediction Within Stack Overflowmentioning
confidence: 99%
“…Yet, given the size of the datasets used in the study and the number of different classifiers compared, we reserve to evaluate feature selection techniques in the domain of best-answer prediction in our future work. Finally, we are aware that recent work on defect prediction has shown how the choice of model validation technique (i.e., repeated cross-validation, in this case) may impact the performance estimate (Tantithamthavorn et al 2017). Given that the datasets used in the study are made publicly available, this limitation might be addressed in future independent replications.…”
Section: Threats To Validitymentioning
confidence: 99%