2021
DOI: 10.3390/sym13020194
|View full text |Cite
|
Sign up to set email alerts
|

A New Oversampling Method Based on the Classification Contribution Degree

Abstract: Data imbalance is a thorny issue in machine learning. SMOTE is a famous oversampling method of imbalanced learning. However, it has some disadvantages such as sample overlapping, noise interference, and blindness of neighbor selection. In order to address these problems, we present a new oversampling method, OS-CCD, based on a new concept, the classification contribution degree. The classification contribution degree determines the number of synthetic samples generated by SMOTE for each positive sample. OS-CCD… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 62 publications
(26 citation statements)
references
References 35 publications
0
25
0
1
Order By: Relevance
“…The oversampling method was applied to molecular description data by Chang et al, and reported that it could be used to reduce the overfitting problem [ 59 ]. However, oversampling has some disadvantages, such as sample overlapping, noise interference, and blindness of neighbor selection [ 60 ]. The main disadvantage of oversampling is that by making copies from existing data, overfitting is likely; in contrast, the main disadvantage of undersampling is the discarding of potentially useful data [ 61 ].…”
Section: Discussionmentioning
confidence: 99%
“…The oversampling method was applied to molecular description data by Chang et al, and reported that it could be used to reduce the overfitting problem [ 59 ]. However, oversampling has some disadvantages, such as sample overlapping, noise interference, and blindness of neighbor selection [ 60 ]. The main disadvantage of oversampling is that by making copies from existing data, overfitting is likely; in contrast, the main disadvantage of undersampling is the discarding of potentially useful data [ 61 ].…”
Section: Discussionmentioning
confidence: 99%
“…Other ways of oversampling include, but are not limited to, the work of [91,92,93,94,78,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119] The validation process is what all oversampling methods have in common, which is basically the evaluation of the classifier's performance employed to classify the oversampled datasets using one or more accuracy measures such as Accuracy, Precision, Recall, F-measure, G-mean, Specificity, Kappa, Matthews correlation coefficient (MCC), Area under the ROC Curve (AUC), True positive rate, False negative (FN), False positive (FP), True positive (TP), True negative (TN), and ROC curve. Table 1 lists 72 oversampling methods, including their known names, references, the number of datasets utilized, the number of classes in these datasets, the classifiers employed, and the performance metrics used to validate the classification results after oversampling.…”
Section: Literature Review Of Oversampling Methodsmentioning
confidence: 99%
“…For this reason, the existing data set was enriched with more records using the standard method of SMOTE [23]. SMOTE is a popular method of machine learning used for oversampling [29] in which the minority class in a data set is generated by a synthetic example in the feature area based on the selected k-nearest neighbor (k-NN) from the minority class [21]. This practice has been adopted in several biomedical studies [4,[30][31][32][33][34][35][36].…”
Section: Data Enrichmentmentioning
confidence: 99%