Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution in each class. Furthermore, the synthetic oversampling method (SMOTE) is a preprocessing technique widely used to synthesize new data and balance the different numbers of samples in each class. One of the SMOTE method's expansions is based on the initial selection approach, which determines the best candidates to be oversampled in the data before the process of synthetic example generation starts. However, SMOTE and most of the existing oversampling methods based on initial selection still found overlapping data on the final result. This issue makes it difficult for any classifiers to determine the decision boundary of each class. Therefore, this research proposes a new oversampling technique called Radius-SMOTE, which emphasizes the initial selection approach by creating synthetic data based on a safe radius distance. Furthermore, new synthetic data are prevented from overlapping in the opposite class with the safe radius distance. The Radius-SMOTE was evaluated extensively with thirteen artificial imbalanced datasets from the KEEL repository. The experimental results show that the proposed method is able to achieve the best results on 5 datasets, namely yeast-1-4-5-8_vs_7, ecoli-0-1-3-7_vs_2-6, Umbilical cord, Pima, and Haberman dataset in term of various assessment metrics. Besides that, the computational cost for our proposed method is also relatively low, with an average time of 0.5 to 1 second on the 13 tested datasets.
The umbilical cord is an organ that circulates oxygen and nutrition from mother to fetus during pregnancy. This study aims to classify the umbilical cord based on ultrasound images. The similarity of shape and coil between each class becomes a challenge. Therefore, it requires feature values that are relevant to the characteristics of these three classes. The condition of imbalanced data sets in this study is also an obstacle that causes the classifier’s performance to degrade on minority classes. Therefore, this study proposes a machine learning model capable of properly dealing with imbalanced data sets and recognizing the umbilical cord class. Furthermore, this study proposes a new feature extraction method, namely, the umbilical coiling index (UCI), which directly adopts obstetricians’ knowledge. The proposed model consists of five stages: image preprocessing, feature extraction, feature selection, oversampling data using SMOTE, and Classification. Machine learning method observations were carried out comprehensively on five based classifiers: Random Forest, KNN, Decision tree, SVM, Naïve Bayes, and Multiclassifier. The results showed that the Random forest and Multiclassifier methods provide the highest accuracy, precision, recall, and F-measure performance in imbalanced data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.