The imbalanced datasets and their classification has pulled in as a hot research topic over the years. It is used in different fields, for example, security, finance, health, and many others. The imbalanced datasets are balanced by applying resampling and various solutions are designed to tackle such datasets that mainly focus on class distribution issues. The imbalanced data is rebalanced using these methods. This paper introduces a technique for balancing data through two stages: first, oversampling methods are utilized in the process of rebalancing such imbalanced dataset using the single-point crossover to generate the new data of minority classes, second, it searches for an optimal subset of the imbalanced and balanced datasets by Jellyfish Search (JS) which is an optimization method. Experiments are performed on 18 real imbalanced datasets, and results are compared with famous oversampling methods and the recently published ACOR (Ant Colony Optimization Resampling) method in terms of different appraisal measurements. Higher performance is recorded by the proposed method and comparability with well-known and recent techniques.
Due to the common use of electronic health databases in many healthcare services, healthcare data are available for researchers in the classification field to make diseases’ diagnosis more efficient. However, healthcare-medical data classification is most challenging because it is often imbalanced data. Most proposed algorithms are susceptible to classify the samples into the majority class, resulting in the insufficient prediction of the minority class. In this paper, a novel preprocessing method is proposed, using boosting and crossover to optimize the ratio of the two classes by progressively rebuilding the training dataset. This approach is shown to give better performance than other state-of-the-art ensemble methods, which is demonstrated by experiments on seven real-world medical datasets with different imbalance ratios and various distributions.
Prediction using machine learning has evolved due to its impact on providing valuable and intuitive feedback. It has covered a wide range of areas for predicting student’ performance. Instructors can track student’s dropout in a particular course at an early stage and try to improve students’ performance. The problem of students’ future performance prediction using advanced statistics and machine learning is a hard problem due to the imbalanced nature of the student data where the number of students who passed the exam is generally much higher than the number of students who failed the exam. This paper proposes a new type of crossover operator called Even-Odd crossover to generate new instances into the minority class to handle the imbalanced data problem. The experiments are implemented using three machine learning (ML) algorithms: random forest (RF), support vector machines (SVM), and K-Nearest-Neighbor (KNN) to ensure the efficiency of the proposed technique. The performance of the classifiers is evaluated using several performance measures. The efficient ability of the proposed method on solving the imbalance problem is proved by performing the experiments on 22 real-world datasets from different fields and four students’ datasets. The proposed Even-Odd crossover shows superior performance compared to state-of-the-art resampling techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.