Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods.
Bankruptcy prediction is a long-standing issue that receives significant attention of academic researchers and industry practitioners. Most of the papers on bankruptcy prediction focus on companies that are listed on the stock market, and there are only limited data for the rest of the companies. These companies, not indexed at any stock market, represent a significant part of the economy. The presented dataset consists of financial ratios of Slovak companies. There are 21 distinctive financial ratios which are available for three consecutive years prior to evaluation year in which companies may have filed for bankruptcy or not. The companies come from four different industries - agriculture, construction, manufacture, retail. We provide data for four consecutive years 2013–2016 for each industry. All companies are categorized as small-medium enterprises according to EU classification. Prediction performance results on this dataset are published in the research paper “Bankruptcy prediction for small- and medium-sized companies using severely imbalanced datasets” (Zoričák et al., 2019).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.