The jinx on the NASA software defect data sets

Petrić, Jean; Bowes, David; Hall, Tracy; Christianson, Bruce; Baddoo, Nathan

doi:10.1145/2915970.2916007

Cited by 45 publications

(26 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Shepperd et al [64] raise concerns related to data quality in the NASA datasets. Furthermore, Petrić et al [59] show that problematic data remain in the cleaned NASA datasets. Thus, the quality of the NASA datasets is questionable.…”

Section: Studied Datasetsmentioning

confidence: 99%

AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

Jiarpakdee

Tantithamthavorn

2018

2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)

View full text Add to dashboard Cite

The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.

show abstract

Section: Studied Datasetsmentioning

confidence: 99%

AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

Jiarpakdee

Tantithamthavorn

2018

2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)

View full text Add to dashboard Cite

show abstract

“…Ghotra et al (Rep [8]) did 2 replication runs of Lessmann et al (Org [5]). The first run was based on uncleaned NASA data (including duplicate and inconsistent instances, see [25]) to confirm if no single classifier is best as in the original (Org [5]). The Friedman test "We used the Scott-Knott test to overcome the confounding issue of overlapping groups that are produced by several other post hoc tests, such as Nemenyis test [13], which was used by the original study.…”

Section: Rq4: Do Original and Replication Studies In Defect Predictiomentioning

confidence: 99%

“…The curated data by Shepperd et al [26] has been cleaned further by Petrić et al [25]. The data errors found during this further cleaning may have also affected previous models.…”

Section: Rq4: Do Original and Replication Studies In Defect Predictiomentioning

confidence: 99%

Reproducibility and replicability of software defect prediction studies

Mahmood

Bowes

Hall

et al. 2018

Information and Software Technology

Self Cite

View full text Add to dashboard Cite

Context: Replications are an important part of scientific disciplines. Replications test the credibility of original studies and can separate true results from those that are unreliable. Objective: In this paper we investigate the replication of defect prediction studies and identify the characteristics of replicated studies. We further assess how defect prediction replications are performed and the consistency of replication findings. Method: Our analysis is based on tracking the replication of 208 defect prediction studies identified by a highly cited Systematic Literature Review (SLR) [1]. We identify how often each of these 208 studies has been replicated and determine the type of replication carried out. We identify quality, citation counts, publication venue, impact factor, and data availability from all 208 SLR defect prediction papers to see if any of these factors are associated with the frequency with which they are replicated. Results: Only 13 (6%) of the 208 studies are replicated. Replication seems related to original papers appearing in the Transactions of Software Engineering (TSE) journal. The number of citations an original paper had was also an indicator of replications. In addition, studies conducted using closed source data seems to have more replications than those based on open source data. Where a paper has been replicated, 11 (38%) out of 29 studies revealed different results to the original study. Conclusion: Very few defect prediction studies are replicated. The lack of replication means that it remains unclear how reliable defect prediction is. We provide practical steps for improving the state of replication.

show abstract

“…• Lines of code should be less than the length of the file [31] (though cumulative code changes or code churn may exceed the length of the file);…”

Section: Evaluating Quality Of Datamentioning

confidence: 99%

Introduction to the EASE 2016 special section: Evidence-based software engineering: Past, present, and future

Beecham

Bowes

Stol

2017

Information and Software Technology

Self Cite

View full text Add to dashboard Cite

The International Conference on Evaluation and Assessment in Software Engineering (EASE) had its twentieth anniversary in 2016, with that year's edition hosted in Limerick, Ireland. Founded in 1997, the EASE conference was the first event solely dedicated to encouraging empirical research in software engineering, and its founders have been longtime advocates of evidence-based software engineering (EBSE). In this editorial, we briefly look back at the history of EBSE and the EASE conference. We then introduce the four articles which are revised and extended versions of papers presented at EASE 2016. We conclude by looking at the future of EBSE, and provide some suggestions for conducting and reporting empirical research.

show abstract

The jinx on the NASA software defect data sets

Cited by 45 publications

References 9 publications

AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

Reproducibility and replicability of software defect prediction studies

Introduction to the EASE 2016 special section: Evidence-based software engineering: Past, present, and future

Contact Info

Product

Resources

About