Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering 2013
DOI: 10.1145/2491411.2491418
|View full text |Cite
|
Sign up to set email alerts
|

Sample size vs. bias in defect prediction

Abstract: Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significant concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
63
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 92 publications
(63 citation statements)
references
References 24 publications
0
63
0
Order By: Relevance
“…Bachmann et al did not find a relationship between the type of a fault and the likelihood that the fault is linked to a commit [2]. Second, recent evidence suggests that the size of bug datasets influences the accuracy of research studies more than the bias of bug datasets [33]. The severity of the bias threat is therefore reduced by the fact that we used a large number of real faults in our study.…”
Section: Threats To Validitymentioning
confidence: 83%
“…Bachmann et al did not find a relationship between the type of a fault and the likelihood that the fault is linked to a commit [2]. Second, recent evidence suggests that the size of bug datasets influences the accuracy of research studies more than the bias of bug datasets [33]. The severity of the bias threat is therefore reduced by the fact that we used a large number of real faults in our study.…”
Section: Threats To Validitymentioning
confidence: 83%
“…Cost effectiveness is often used to evaluate defect prediction approaches [13], [15], [14]. Cost effectiveness is measured by computing the percentage of buggy changes found when reviewing a specific percentage of the lines of code.…”
Section: ) Cost Effectivenessmentioning
confidence: 99%
“…Threats to construct validity refer to the suitability of our evaluation metrics. We use cost effectiveness and F1-score which are also used by past software engineering studies to evaluate the effectiveness of various prediction techniques [7], [12], [16], [33], [34], [17], [14], [13], [35]. Thus, we believe there is little threat to construct validity.…”
Section: Threats To Validitymentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, We use two well-known metrics to evaluate the performance of a predictive algorithm: cost effectiveness [10], [11], [12], [13] and F-measure [14], [15], [16], [12]. We compare the composite algorithms against the best variant of CODEP which uses logistic regression as a meta-learner -referred to as CODEP Logistic .…”
Section: Introductionmentioning
confidence: 99%