Sample size vs. bias in defect prediction

Rahman, Foyzur; Posnett, Daryl; Herraiz, Israel; Dévanbu, Prémkumar

doi:10.1145/2491411.2491418

Cited by 92 publications

(63 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Bachmann et al did not find a relationship between the type of a fault and the likelihood that the fault is linked to a commit [2]. Second, recent evidence suggests that the size of bug datasets influences the accuracy of research studies more than the bias of bug datasets [33]. The severity of the bias threat is therefore reduced by the fact that we used a large number of real faults in our study.…”

Section: Threats To Validitymentioning

confidence: 83%

Are mutants a valid substitute for real faults in software testing?

Just

Jalali

Inozemtseva

et al. 2014

Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

514

408

View full text Add to dashboard Cite

A good test suite is one that detects real faults. Because the set of faults in a program is usually unknowable, this definition is not useful to practitioners who are creating test suites, nor to researchers who are creating and evaluating tools that generate test suites. In place of real faults, testing research often uses mutants, which are artificial faults -each one a simple syntactic variation -that are systematically seeded throughout the program under test. Mutation analysis is appealing because large numbers of mutants can be automatically-generated and used to compensate for low quantities or the absence of known real faults.Unfortunately, there is little experimental evidence to support the use of mutants as a replacement for real faults. This paper investigates whether mutants are indeed a valid substitute for real faults, i.e., whether a test suite's ability to detect mutants is correlated with its ability to detect real faults that developers have fixed. Unlike prior studies, these investigations also explicitly consider the conflating effects of code coverage on the mutant detection rate.Our experiments used 357 real faults in 5 open-source applications that comprise a total of 321,000 lines of code. Furthermore, our experiments used both developer-written and automaticallygenerated test suites. The results show a statistically significant correlation between mutant detection and real fault detection, independently of code coverage. The results also give concrete suggestions on how to improve mutation analysis and reveal some inherent limitations.

show abstract

Section: Threats To Validitymentioning

confidence: 83%

Are mutants a valid substitute for real faults in software testing?

Just

Jalali

Inozemtseva

et al. 2014

Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

514

408

View full text Add to dashboard Cite

show abstract

“…Cost effectiveness is often used to evaluate defect prediction approaches [13], [15], [14]. Cost effectiveness is measured by computing the percentage of buggy changes found when reviewing a specific percentage of the lines of code.…”

Section: ) Cost Effectivenessmentioning

confidence: 99%

“…Threats to construct validity refer to the suitability of our evaluation metrics. We use cost effectiveness and F1-score which are also used by past software engineering studies to evaluate the effectiveness of various prediction techniques [7], [12], [16], [33], [34], [17], [14], [13], [35]. Thus, we believe there is little threat to construct validity.…”

Section: Threats To Validitymentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning for Just-in-Time Defect Prediction

Yang

Xia

et al. 2015

2015 IEEE International Conference on Software Quality, Reliability and Security

295

165

View full text Add to dashboard Cite

Abstract-Defect prediction is a very meaningful topic, particularly at change-level. Change-level defect prediction, which is also referred as just-in-time defect prediction, could not only ensure software quality in the development process, but also make the developers check and fix the defects in time. Nowadays, deep learning is a hot topic in the machine learning literature. Whether deep learning can be used to improve the performance of justin-time defect prediction is still uninvestigated.In this paper, to bridge this research gap, we propose an approach Deeper which leverages deep learning techniques to predict defect-prone changes. We first build a set of expressive features from a set of initial change features by leveraging a deep belief network algorithm. Next, a machine learning classifier is built on the selected features. To evaluate the performance of our approach, we use datasets from six large open source projects, i.e., Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL, containing a total of 137,417 changes. We compare our approach with the approach proposed by Kamei et al. [1]. The experimental results show that on average across the 6 projects, Deeper could discover 32.22% more bugs than Kamei et al's approach (51.04% versus 18.82% on average). In addition, Deeper can achieve F1-scores of 0.22-0.63, which are statistically significantly higher than those of Kamei et al.'s approach on 4 out of the 6 projects.

show abstract

“…In this paper, We use two well-known metrics to evaluate the performance of a predictive algorithm: cost effectiveness [10], [11], [12], [13] and F-measure [14], [15], [16], [12]. We compare the composite algorithms against the best variant of CODEP which uses logistic regression as a meta-learner -referred to as CODEP Logistic .…”

Section: Introductionmentioning

confidence: 99%

An Empirical Study of Classifier Combination for Cross-Project Defect Prediction

Zhang

Xia

et al. 2015

2015 IEEE 39th Annual Computer Software and Applications Conference

View full text Add to dashboard Cite

Abstract-To help developers better allocate testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on past history of buggy classes. These techniques work well as long as a sufficient amount of data is available to train a prediction model. However, there is rarely enough training data for new software projects. To deal with this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, has been proposed and is regarded as a new challenge for defect prediction. So far, only a few cross-project defect prediction techniques have been proposed. To advance the state-of-the-art, in this work, we investigate 7 composite algorithms, which integrate multiple machine learning classifiers, to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we perform experiments on 10 open source software systems from the PROMISE repository which contain a total of 5,305 instances labeled as defective or clean. We compare the composite algorithms with CODEPLogistic, which is the latest cross-project defect prediction algorithm proposed by Panichella et al. [1], in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experiment results show that several algorithms outperform CODEPLogistic: Max performs the best in terms of F-measure and its average F-measure outperforms that of CODEPLogistic by 36.88%. BaggingJ48 performs the best in terms of cost effectiveness and its average cost effectiveness outperforms that of CODEPLogistic by 15.34%.

show abstract

Sample size vs. bias in defect prediction

Cited by 92 publications

References 24 publications

Are mutants a valid substitute for real faults in software testing?

Are mutants a valid substitute for real faults in software testing?

Deep Learning for Just-in-Time Defect Prediction

An Empirical Study of Classifier Combination for Cross-Project Defect Prediction

Contact Info

Product

Resources

About