2021
DOI: 10.26594/register.v7i1.2206
|View full text |Cite
|
Sign up to set email alerts
|

An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset

Abstract: Class imbalance occurs when the distribution of classes between the majority and the minority classes is not the same. The data on imbalanced classes may vary from mild to severe. The effect of high-class imbalance may affect the overall classification accuracy since the model is most likely to predict most of the data that fall within the majority class.  Such a model will give biased results, and the performance predictions for the minority class often have no impact on the model. The use of the oversampling… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 24 publications
(25 reference statements)
0
5
0
Order By: Relevance
“…This dataset contains 100,000 movie ratings by MovieLens users and consists of three files: a rating file, a movie information file, and a user file. Systematic sampling is a random sampling method that involves selecting sample units at certain intervals from the desired target population [15]. This method is carried out by determining the interval between units taken randomly from the population, and then choosing the first unit at random.…”
Section: Methodsmentioning
confidence: 99%
“…This dataset contains 100,000 movie ratings by MovieLens users and consists of three files: a rating file, a movie information file, and a user file. Systematic sampling is a random sampling method that involves selecting sample units at certain intervals from the desired target population [15]. This method is carried out by determining the interval between units taken randomly from the population, and then choosing the first unit at random.…”
Section: Methodsmentioning
confidence: 99%
“…An unbalanced data set, where the number of instances in each category is significantly different, can lead to poor classification results. The class imbalance occurs when the distribution of classes between the majority and minority classes is different [33]. In the case of our data set consisting of seven unbalanced categories, removing categories with little data (truncated, impossible to check, and other) and replacing partially true categories with true and partially false categories with false ones is a valid approach to balancing the data set.…”
Section: Algorithm 1 the Process Of Web Scraper Designmentioning
confidence: 99%
“…This method synthesizes data adaptively based on the distribution of positive samples [15]. The advantage of ADASYN is that it can focus data duplication on only one specific area [16], where samples are produced more in areas with low minority sample densities than in areas with high densities. This increase in distribution can reduce data imbalances and help improve classification [17].…”
Section: Adaptive Synthetic Sampling (Adasyn)mentioning
confidence: 99%