Towards logistic regression models for predicting fault-prone code across software projects

Cruz, Ana Erika Camargo; Ochimizu, Koichiro

doi:10.1109/esem.2009.5316002

Cited by 106 publications

(63 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The combination of data set selection and a point-wise strategy like the strategy proposed by Turhan et al [18] could improve the efficiency. Additionally, transformation techniques, such as proposed by [5] can also be applied to increase the similarity between data sets and, thereby, the prediction performance.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Training data selection for cross-project defect prediction

Herbold

2013

Proceedings of the 9th International Conference on Predictive Models in Software Engineering

123

101

View full text Add to dashboard Cite

Software defect prediction has been a popular research topic in recent years and is considered as a means for the optimization of quality assurance activities. Defect prediction can be done in a withinproject or a cross-project scenario. The within-project scenario produces results with a very high quality, but requires historic data of the project, which is often not available. For the cross-project prediction, the data availability is not an issue as data from other projects is readily available, e.g., in repositories like PROMISE. However, the quality of the defect prediction results is too low for practical use. Recent research showed that the selection of appropriate training data can improve the quality of cross-project defect predictions. In this paper, we propose distance-based strategies for the selection of training data based on distributional characteristics of the available data. We evaluate the proposed strategies in a large case study with 44 data sets obtained from 14 open source projects. Our results show that our training data selection strategy improves the achieved success rate of cross-project defect predictions significantly. However, the quality of the results still cannot compete with within-project defect prediction.

show abstract

Section: Resultsmentioning

confidence: 99%

“…Carmago Cruz and Ochimizu [5] propose not to simply reuse data for cross-project defect prediction, but to transform the data such that the underlying distributions are similar. They observe that such a transformation improves the quality of the predictions.…”

Section: Related Workmentioning

confidence: 99%

Training data selection for cross-project defect prediction

Herbold

2013

Proceedings of the 9th International Conference on Predictive Models in Software Engineering

123

101

View full text Add to dashboard Cite

show abstract

“…We have used CK object-oriented metrics in our study as independent variables. These metrics have already been validated by various studies (El Emam et al, 2001;Gyimothy et al, 2005;Cruz & Ochimizu, 2009;Singh et al, 2010;Glasberg et al, 2000. ) to judge the effectiveness of the these metrics in measuring the concepts they represent.…”

Section: Threats To Validitymentioning

confidence: 88%

Fault prediction considering threshold effects of object‐oriented metrics

Malhotra

Bansal

2014

Expert Systems

View full text Add to dashboard Cite

Software product quality can be enhanced significantly if we have a good knowledge and understanding of the potential faults therein. This paper describes a study to build predictive models to identify parts of the software that have high probability of occurrence of fault. We have considered the effect of thresholds of object‐oriented metrics on fault proneness and built predictive models based on the threshold values of the metrics used. Prediction of fault prone classes in earlier phases of software development life cycle will help software developers in allocating the resources efficiently. In this paper, we have used a statistical model derived from logistic regression to calculate the threshold values of object oriented, Chidamber and Kemerer metrics. Thresholds help developers to alarm the classes that fall outside a specified risk level. In this way, using the threshold values, we can divide the classes into two levels of risk – low risk and high risk. We have shown threshold effects at various risk levels and validated the use of these thresholds on a public domain, proprietary dataset, KC1 obtained from NASA and two open source, Promise datasets, IVY and JEdit using various machine learning methods and data mining classifiers. Interproject validation has also been carried out on three different open source datasets, Ant and Tomcat and Sakura. This will provide practitioners and researchers with well formed theories and generalised results. The results concluded that the proposed threshold methodology works well for the projects of similar nature or having similar characteristics.

show abstract

“…Unfortunately, such a filtering only reduces the gap between the accuracy of within-and cross-project defect prediction models. Cruz et al [16] studied the application of a data transformation for building and using logistic regression models. They showed that simple log transformations can be useful when measures are not as spread as those measures used in the construction.…”

Section: Related Workmentioning

confidence: 99%

Multi-objective Cross-Project Defect Prediction

Canfora

Lucia

Penta

et al. 2013

2013 IEEE Sixth International Conference on Software Testing, Verification and Validation

149

View full text Add to dashboard Cite

Cross-project defect prediction is very appealing because (i) it allows predicting defects in projects for which the availability of data is limited, and (ii) it allows producing generalizable prediction models. However, existing research suggests that cross-project prediction is particularly challenging and, due to heterogeneity of projects, prediction accuracy is not always very good.This paper proposes a novel, multi-objective approach for cross-project defect prediction, based on a multi-objective logistic regression model built using a genetic algorithm. Instead of providing the software engineer with a single predictive model, the multi-objective approach allows software engineers to choose predictors achieving a compromise between number of likely defect-prone artifacts (effectiveness) and LOC to be analyzed/tested (which can be considered as a proxy of the cost of code inspection).Results of an empirical evaluation on 10 datasets from the Promise repository indicate the superiority and the usefulness of the multi-objective approach with respect to single-objective predictors. Also, the proposed approach outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.

show abstract

Towards logistic regression models for predicting fault-prone code across software projects

Cited by 106 publications

References 6 publications

Training data selection for cross-project defect prediction

Training data selection for cross-project defect prediction

Fault prediction considering threshold effects of object‐oriented metrics

Multi-objective Cross-Project Defect Prediction

Contact Info

Product

Resources

About