Data Quality: Some Comments on the NASA Software Defect Datasets

Shepperd, Martin; Song, Qinbao; Sun, Zhongbin; Mair, Carolyn

doi:10.1109/tse.2013.11

Cited by 477 publications

(277 citation statements)

References 20 publications

Supporting

Mentioning

277

Contrasting

Order By: Relevance

“…There are various mechanisms to facilitate this sharing of data, with the Promise Data Repository [11] being at the forefront of such initiatives. Whilst sharing of data is clearly a good thing it is not without risk, particularly when problems and errors in the data are propagated [12], [13]. However, it does afford us the opportunity to examine the impact of other factors upon defect prediction performance since many different researcher groups have used the same data.…”

Section: Software Defect Predictionmentioning

confidence: 99%

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Shepperd

Bowes

Hall

2014

IIEEE Trans. Software Eng.

Self Cite

311

191

View full text Add to dashboard Cite

Abstract-Background. The ability to predict defect-prone software components would be valuable. Consequently, there have been many empirical studies to evaluate the performance of different techniques endeavouring to accomplish this effectively. However no one technique dominates and so designing a reliable defect prediction model remains problematic. Objective. We seek to make sense of the many conflicting experimental results and understand which factors have the largest effect on predictive performance. Method. We conduct a meta-analysis of all relevant, high quality primary studies of defect prediction to determine what factors influence predictive performance. This is based on 42 primary studies that satisfy our inclusion criteria that collectively report 600 sets of empirical prediction results. By reverse engineering a common response variable we build a random effects ANOVA model to examine the relative contribution of four model building factors (classifier, data set, input metrics and researcher group) to model prediction performance. Results. Surprisingly we find that the choice of classifier has little impact upon performance (1.3%) and in contrast the major (31%) explanatory factor is the researcher group. It matters more who does the work than what is done. Conclusion. To overcome this high level of researcher bias, defect prediction researchers should (i) conduct blind analysis, (ii) improve reporting protocols and (iii) conduct more intergroup studies in order to alleviate expertise issues. Lastly, research is required to determine whether this bias is prevalent in other applications domains.

show abstract

Section: Software Defect Predictionmentioning

confidence: 99%

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Shepperd

Bowes

Hall

2014

IIEEE Trans. Software Eng.

Self Cite

311

191

View full text Add to dashboard Cite

show abstract

“…One can safely assume that, with more contextual information about the developers and development processes involved etc., a completely different picture of software quality might emerge. However, the important point is that this assessment accurately captures the author's subjective assessment of the system, when limited to reasoning about a relatively restricted set of metrics of potentially questionable provenance [14]. As will be discussed later, the construction of a more systematic validation study is a part of our ongoing and future work.…”

Section: Resultsmentioning

confidence: 99%

“…Static analysis tools can fail to parse or resolve certain relations in the code, or data that has been collected by hand might apply to a different version of the source code than the one we are assessing. This is a salient point for the CM1 system, where Shepperd et al [14] have highlighted some important inconsistencies in the data-set over different studies that have utilised the data sets.…”

Section: Motivating Examplementioning

confidence: 99%

Using evidential reasoning to make qualified predictions of software quality

Walkinshaw

2013

Proceedings of the 9th International Conference on Predictive Models in Software Engineering

View full text Add to dashboard Cite

Software quality is commonly characterised in a top-down manner. High-level notions such as quality are decomposed into hierarchies of sub-factors, ranging from abstract notions such as maintainability and reliability to lower-level notions such as test coverage or team-size. Assessments of abstract factors are derived from relevant sources of information about their respective lower-level sub-factors, by surveying sources such as metrics data and inspection reports. This can be difficult because (1) evidence might not be available, (2) interpretations of the data with respect to certain quality factors may be subject to doubt and intuition, and (3) there is no straightforward means of blending hierarchies of heterogeneous data into a single coherent and quantitative prediction of quality. This paper shows how Evidential Reasoning (ER) -a mathematical technique for reasoning about uncertainty and evidence -can address this problem. It enables the quality assessment to proceed in a bottomup manner, by the provision of low-level assessments that make any uncertainty explicit, and automatically propagating these up to higher-level 'belief-functions' that accurately summarise the developer's opinion and make explicit any doubt or ignorance.

show abstract

“…BENCHMARK DATASET To investigate the equivalence and the stability of the feature selection methods for noisy SDD, we used eight original version projects of NASA dataset and the corresponding clean version preprocessed by Shepperd et al [17] as our experimental dataset. NASA dataset is a method-level software defect dataset that is characterized by static code metrics [5].…”

Section: Stability Analysismentioning

confidence: 99%

“…For the generalization of our results, we carefully chose the NASA dataset which is commonly used in previous studies in software engineering domain [4], [17], [18], [19], [20]. Besides, previous work also conducted case studies on NASA dataset to investigate the effect of noise on SDD [8], [27].…”

Section: Threats To Validitymentioning

confidence: 99%

An Empirical Study on the Equivalence and Stability of Feature Selection for Noisy Software Defect Data

Zhou

Liu²,

Xia

et al. 2017

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

 Abstract-Software Defect Data (SDD) are used to build defect prediction models for software quality assurance. Existing work employs feature selection to eliminate irrelevant features in the data to improve prediction performance. Previous studies have shown that different feature selection methods do not always yield similar prediction performance on SDD, which indicates that these methods are not equivalent. Also, previous studies have shown that SDD usually contains noise that may interfere the process of feature selection. In this work, we empirically investigate and measure the equivalence of different feature selection methods for SDD. Further, we intend to analyze the stability of the methods for noisy SDD. We perform statistical analyses on eight projects from NASA dataset with eight feature selection methods. For the equivalence analysis, we introduce Principal Component Analysis (PCA) and overlap index to qualitatively and quantitatively analyze the equivalence of these methods respectively. For the stability analysis, we apply consistency index to measure the stability of these methods. Experimental results indicate that different feature selection methods are indeed not equivalent to each other, and Correlation and Fisher Score methods achieve better stability.

show abstract

Data Quality: Some Comments on the NASA Software Defect Datasets

Cited by 477 publications

References 20 publications

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Using evidential reasoning to make qualified predictions of software quality

An Empirical Study on the Equivalence and Stability of Feature Selection for Noisy Software Defect Data

Contact Info

Product

Resources

About