Identifying the change‐prone parts of software could help managers and developers to effectively allocate maintenance resource and time during early phases of software life cycle. Change‐proneness prediction on file level with binary classification methods makes such identification possible. As the fact that change‐prone files frequently account for a small part of all the files, the prediction performance of standard classification methods is not satisfying. In this paper, we employ imbalanced learning methods, including bagging, resampling, and especially their combination to reduce the performance decrease of standard classifiers caused by the class imbalance problem in change‐proneness prediction. Besides, we propose a boxplot‐based partition method to provide more proper change‐proneness label designation for the training data. Eight open‐source Java projects are chosen in the empirical study to validate the effectiveness of the combination methods in change‐proneness prediction. The experimental results of the empirical study show that combining bagging with resampling can significantly improve the prediction performance of only bagging or resampling. Of all the combination methods employed, combination of bagging with undersampling performs better than others. And support vector machine is more effective as a base classifier than J48 and naive Bayes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.