Training data selection for cross-project defect prediction

Herbold, Steffen

doi:10.1145/2499393.2499395

Cited by 122 publications

(103 citation statements)

References 24 publications

Supporting

Mentioning

101

Contrasting

Unclassified

Order By: Relevance

“…(1) Source selection algorithm for software project defect prediction Prediction of cross project software defects, between source and target attribute measure distribution project project there is a certain correlation [5] , so the maximum minimum value, and the mean and standard deviation of 4 data distribution data to define the feature vector of the C project. To help achieve the feature similarity sort source and target selection of software project project the distribution of software module is a module attribute, expressed as C = {c max , c min , c mean , c std }, the distribution characteristics of each project data attribute values can be used in c(F m (S))to represent the feature vector of the S project, which can be defined as follows:…”

Section: Software Defect Prediction Modelling For Network Cloud Develmentioning

confidence: 99%

Software Defect Prediction Model Research for Network and Cloud Software Development

Yang¹

2017

Proceedings of the 2017 5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 20

View full text Add to dashboard Cite

Abstract. With the process of software development and application basing on network and cloud, the change of software development requires new software defect prediction method for these kinds of software development, which can solve the problems of the traditional software defect prediction method based on target project, such as the same predict background and higher cost of defect tagging. A new software defect prediction method based on multi source data oriented network and cloud development process is proposed. This method selects the predictive candidates from multisource projects which have similar characteristics as objective projects, and then guides the training data selection by the software modules, finishes the prediction based on Naive Bayesian algorithm. Finally through algorithm experiment this method is proved superior to the traditional WP prediction model.

show abstract

Section: Software Defect Prediction Modelling For Network Cloud Develmentioning

confidence: 99%

Software Defect Prediction Model Research for Network and Cloud Software Development

Yang¹

2017

Proceedings of the 2017 5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 20

View full text Add to dashboard Cite

show abstract

“…The first one is to apply the data filtering method to find the best suitable training data (e.g., [8,9,12,14]). For example, Turhan et al [8] proposed a nearest neighbor (NN) filter to select cross-company data.…”

Section: A Defect Predictionmentioning

confidence: 99%

Combing Data Filter and Data Sampling for Cross-Company Defect Prediction: An Empricial Study

Zhang³

et al. 2017

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Abstract-Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a prediction model with high performance. On the other hand, the CC data has the highly imbalanced nature between the defectiveprone and non-defective classes, which will degrade the performance of CCDP. To address such issues, this paper proposes an approach, in which data sampling is combined with data filter, to overcome these problems. Data sampling seeks a more balanced dataset through the addition or removal of instances, while data filter is a process of filtering out the irrelevant CC data so that the performance of CCDP models can be improved. We employ two data filtering methods called NN filter and DBSCAN filter combined with SMOTE (Synthetic Minority Oversampling Technique) and RUS (Random UnderSampling). Eight different approaches would be produced when combing these four techniques: 1-NN filter performed prior to RUS; 2-NN filter performed after RUS; 3-NN filter performed prior to SMOTE; 4-NN filter performed after SMOTE; 5-DBSCAN filter performed prior to RUS; 6-DBSCAN filter performed after RUS; 7-DBSCAN filter performed prior to SMOTE; 8-DBSCAN filter performed after SMOTE. The empirical study was carried out on 15 publicly available project datasets. The experimental results demonstrate that NN filter performed prior to RUS (Approach 1) performs better than the other seven approaches.

show abstract

“…After Briand et al made an early attempt to validate the applicability of CPDP [14], many researchers in this field have tried to improve the performance of CPDP models using different techniques such as data mining and machine learning. Fortunately, recent studies have shown that it is indeed a feasible method for defect prediction in software projects with different sizes [13,[15][16][17][18][19][20]. Due to space limitations of this paper, for more details about CPDP approaches, please refer to the latest surveys [6,7].…”

Section: Related Workmentioning

confidence: 99%

A Ranking-Oriented Approach to Cross-Project Software Defect Prediction: An Empirical Study

You¹,

2016

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Abstract-In recent years, cross-project defect prediction (CPDP) has become very popular in the field of software defect prediction. It was treated as a binary classification or regression problem in most of previous studies. However, the existing methods to solve this problem may be not suitable for those projects with limited manpower and time. In this paper, we revisit the issue and treat it as a ranking problem. Inspired by the idea of the Point-wise approach to Learning to Rank, we propose a ranking-oriented CPDP approach called ROCPDP. The empirical results obtained based on AEEEM show that the defect predictor built with our method under a specific CPDP context, in general, outperforms those predictors trained by using the benchmark methods in both CPDP and WPDP (within-project defect prediction) scenarios in terms of two common evaluation metrics for rank correlation. So, our work could be an initial attempt to construct new rankingoriented CPDP models for newly created or inactive projects.

show abstract

Training data selection for cross-project defect prediction

Cited by 122 publications

References 24 publications

Software Defect Prediction Model Research for Network and Cloud Software Development

Software Defect Prediction Model Research for Network and Cloud Software Development

Combing Data Filter and Data Sampling for Cross-Company Defect Prediction: An Empricial Study

A Ranking-Oriented Approach to Cross-Project Software Defect Prediction: An Empirical Study

Contact Info

Product

Resources

About