2015 IEEE/ACM 37th IEEE International Conference on Software Engineering 2015
DOI: 10.1109/icse.2015.139
|View full text |Cite
|
Sign up to set email alerts
|

Online Defect Prediction for Imbalanced Data

Abstract: Abstract-Many defect prediction techniques are proposed to improve software reliability. Change classification predicts defects at the change level, where a change is the modifications to one file in a commit. In this paper, we conduct the first study of applying change classification in practice.We identify two issues in the prediction process, both of which contribute to the low prediction performance. First, the data are imbalanced-there are much fewer buggy changes than clean changes. Second, the commonly … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
208
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 228 publications
(209 citation statements)
references
References 48 publications
1
208
0
Order By: Relevance
“…These studies showed that oversampling approach helps in achieving better prediction performance when dataset is imbalanced. In addition, most of the reported works (e.g., Li & Wang, 2014;Pelayo & Dick, 2007;Shatnawi, 2012;Tan, Tan, Dara, & Mayeux, 2015 ) on software fault prediction have used oversampling approach for generating synthetic values. Due to these reasons, we have used an oversampling approach in this study.…”
Section: Resampling the Training Subsetsmentioning
confidence: 98%
“…These studies showed that oversampling approach helps in achieving better prediction performance when dataset is imbalanced. In addition, most of the reported works (e.g., Li & Wang, 2014;Pelayo & Dick, 2007;Shatnawi, 2012;Tan, Tan, Dara, & Mayeux, 2015 ) on software fault prediction have used oversampling approach for generating synthetic values. Due to these reasons, we have used an oversampling approach in this study.…”
Section: Resampling the Training Subsetsmentioning
confidence: 98%
“…Using the functionality provided by Spark, creating the data splits as required for the approach by Tan et al [71] as well as the training of the defect prediction model was straightforward. After fetching the data from the MongoDB, a Map job was used to prepare the data.…”
Section: Defect Predictionmentioning
confidence: 99%
“…We selected a change-based defect prediction model based on a recent publication by Tan et al [71]. The approach by Tan et al suggests to use the first part of a project as training data, then leave a gap and predict the remainder of the project using a prediction model trained on the first part of the data.…”
Section: Defect Predictionmentioning
confidence: 99%
“…For addressing the class imbalance problem in fault prediction, numerous methods have been developed at data and algorithm levels. Data‐level methods include a variety of resampling techniques, such as random undersampling, random oversampling, and SMOTE (Synthetic Minority Over‐sampling TEchnique) .…”
Section: Related Workmentioning
confidence: 99%