In this paper we studied re-sampling methods for learning classifiers from imbalanced data. We carried out a series of experiments on artificial data sets to explore the impact of noisy and borderline examples from the minority class on the classifier performance. Results showed that if data was sufficiently disturbed by these factors, then the focused re-sampling methods-NCR and our SPIDER2-strongly outperformed the oversampling methods. They were also better for real-life data, where PCA visualizations suggested possible existence of noisy examples and large overlapping ares between classes.
Abstract. In this paper we discuss problems of constructing classifiers from imbalanced data. We describe a new approach to selective preprocessing of imbalanced data which combines local over-sampling of the minority class with filtering difficult examples from the majority classes. In experiments focused on rule-based and tree-based classifiers we compare our approach with two other related pre-processing methods -NCR and SMOTE. The results show that NCR is too strongly biased toward the minority class and leads to deteriorated specificity and overall accuracy, while SMOTE and our approach do not demonstrate such behavior. Analysis of the degree to which the original class distribution has been modified also reveals that our approach does not introduce so extensive changes as SMOTE.
This paper briefly describes ROSE software package. It is an interactive, modular system designed for analysis and knowledge discovery based on rough set theory in 32-bit operating systems on PC computers. It implements classical rough set theory as well as its extension based on variable precision model. It includes generation of decision rules for classification systems and knowledge discovery.L. Polkowski and A. Skowron (Eds.): RSCTC'98
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.