2023
DOI: 10.3390/info14010054
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining

Abstract: Educational data mining is capable of producing useful data-driven applications (e.g., early warning systems in schools or the prediction of students’ academic achievement) based on predictive models. However, the class imbalance problem in educational datasets could hamper the accuracy of predictive models as many of these models are designed on the assumption that the predicted class is balanced. Although previous studies proposed several methods to deal with the imbalanced class problem, most of them focuse… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
60
0
2

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 128 publications
(62 citation statements)
references
References 33 publications
0
60
0
2
Order By: Relevance
“…given by some literature is exactly 40% of the total sample size of positive cases [20,37,38], Of course, the SMOTE-NC also has limitations, although it has been proven to perform well in many fields. [39] Therefore, when we need to balance the categories of clinical problems, maybe RF plus SMOTE-NC is a better combination. Caution is advised in interpreting this conclusion, and further evidence is warranted for confirmation in the immediate future.…”
Section: Discussionmentioning
confidence: 99%
“…given by some literature is exactly 40% of the total sample size of positive cases [20,37,38], Of course, the SMOTE-NC also has limitations, although it has been proven to perform well in many fields. [39] Therefore, when we need to balance the categories of clinical problems, maybe RF plus SMOTE-NC is a better combination. Caution is advised in interpreting this conclusion, and further evidence is warranted for confirmation in the immediate future.…”
Section: Discussionmentioning
confidence: 99%
“…SMOTE [24] is a classical oversampling method to solve the data imbalance problem. It uses the idea of linear interpolation to construct minority samples in the nearest neighborhood.…”
Section: Resampling Methodsmentioning
confidence: 99%
“…Thirdly, compared to other methods, evolutionary features achieve best results, but the corresponding feature dimensions are also relatively higher, especially for the M495 dataset. Fourthly, the predictive accuracy of New-All-2 is not only best but also has lower feature dimensions, outperforming the combined methods of Group1, Group2, and Group3: To deal with the problem of sample imbalance, four different sampling methods were employed for comparison-SMOTE [24], SMOTEENN [25], ADASYN [28], and SMOTETOMEK [29]. The results are presented in Table 7.…”
Section: Effect Of Various Feature Extraction Methodsmentioning
confidence: 99%
“…As expected, the brightest students are enrolled in the medicine program, so their GPA is high. However, the class imbalance problem in educational datasets could hamper the accuracy of predictive models as many of these models are designed on the assumption that the predicted class is balanced [ 27 ]. Therefore, to have an equitable class distribution for the target variable GPA, students were divided into two classes: “excellent,” having a GPA more than 4.25, and “good,” having a GPA less than 4.25 but more than 2.…”
Section: Methodsmentioning
confidence: 99%
“…The EDM community uses four major approaches: prediction models, structure discovery, relationship mining and discovery with models [ 26 ]. Educational data mining is capable of producing useful data-driven applications (e.g., early warning systems in schools or the prediction of students’ academic achievement) based on predictive models [ 27 ]. Researchers at King Khalid University (KKU) have used Orange data mining tool for EDM for detecting patterns and predicting academic performance of students using online courses offered through learning management systems [ 28 ].…”
Section: Introductionmentioning
confidence: 99%