INTRODUCTION: Nowadays, many machine learning tasks involve learning from imbalanced datasets, leading to the miss-classification of the minority class. One of the state-of-the-art approaches to "solve" this problem at the data level is Synthetic Minority Over-sampling Technique (SMOTE) which in turn uses K-Nearest Neighbors (KNN) algorithm to select and generate new instances.OBJECTIVES: This paper presents SMOTE-Cov, a modified SMOTE that use Covariance Matrix instead of KNN to balance datasets, with continuous attributes and binary class.
METHODS:We implemented two variants SMOTE-CovI, which generates new values within the interval of each attribute and SMOTE-CovO, which allows some values to be outside the interval of the attributes.
RESULTS:The results show that our approach has a similar performance as the state-of-the-art approaches. CONCLUSION: In this paper, a new algorithm is proposed to generate synthetic instances of the minority class, using the Covariance Matrix.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.