2022
DOI: 10.26552/com.c.2022.3.d105-d115
|View full text |Cite
|
Sign up to set email alerts
|

Smote vs. Random Undersampling for Imbalanced Data - Car Ownership Demand Model

Abstract: Because the numbers of cars reflect each person's travel behaviors for each specific location, the car ownership demand model plays a dominant role in analysis of the travel demand in order to understand each area's individual and household travel behaviors. However, the study project for the master plan of the Khon Kaen expressway represented imbalanced data; namely, the majority class and the minority class were not equal. Before developing a machine learning model, this study suggested a solution to balance… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Random Oversampling performs random replication on minority samples to balance the class distribution [7]. Meanwhile Random Undersampling used to balance the distribution of each class by randomly removing majority class samples [6]. SMOTE-NC is an oversampling technique that uses Knearest neighbor characteristics in explanatory variables to produce synthetic data in the minority class [8].…”
Section: M-2024-495mentioning
confidence: 99%
See 1 more Smart Citation
“…Random Oversampling performs random replication on minority samples to balance the class distribution [7]. Meanwhile Random Undersampling used to balance the distribution of each class by randomly removing majority class samples [6]. SMOTE-NC is an oversampling technique that uses Knearest neighbor characteristics in explanatory variables to produce synthetic data in the minority class [8].…”
Section: M-2024-495mentioning
confidence: 99%
“…Imbalanced data is data that has an unbalanced distribution of response variable classes, the number of one class is less or more than the number of other data classes [5]. Imbalanced class that is not resolved can affect the performance of the model used [6]. The data balancing methods used in this research are Random Oversampling, Random Undersampling, and Synthetic Minority Over-sampling Technique for Nominal and Continuous (SMOTE-NC).…”
Section: Introductionmentioning
confidence: 99%
“…DT is a tree-based classifier in which the attribute that produces the highest information gain, or minimum Gini index at each tree level is selected to partition the data into increasingly Fire classification with machine learning homogeneous subgroups whilst RF is a type of ensemble algorithm that constructs a specified number of trees, with each tree constructed by sampling a subset from the training set at random and with replacement (Chaipanha and Kaewwichian, 2022;Shah et al, 2020). Conversely, MLP is a fully-connected neural network with an input layer, hidden layer(s) and an output layer.…”
Section: Modellingmentioning
confidence: 99%
“…Chaipanha and Kaewwichian [47] To provide a way for balancing the data using over-and under-sampling strategies. kNN, NB, DTs No Manjushree, GH, Swamy and Giridharan [6] To apply ML models to forecast the household characteristics that influence car ownership.…”
Section: Study Study Aim Model(s) Used Hyperparameter Optimizationmentioning
confidence: 99%