2020
DOI: 10.1002/int.22230
|View full text |Cite
|
Sign up to set email alerts
|

A self‐adaptive synthetic over‐sampling technique for imbalanced classification

Abstract: Traditionally, in supervised machine learning, (a significant) part of the available data (usually 50%-80%) is used for training and the rest-for validation. In many problems, however, the data are highly imbalanced in regard to different classes or does not have good coverage of the feasible data space which, in turn, creates problems in validation and usage phase. In this paper, we propose a technique for synthesizing feasible and likely data to help balance the classes as well as to boost the performance in… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 52 publications
(18 citation statements)
references
References 28 publications
0
18
0
Order By: Relevance
“…In addition, the presence of less-represented classes has led the use of two techniques to synthesize data and, therefore, to help balance the classes (malignant and benign) giving the possibility to better train the considered classifiers: the self-adaptive synthetic over-sampling (SASYNO) approach and the adaptive synthetic sampling (ADASYN) approach. This allowed to boost the performance both overall and in terms of confusion matrix [ 47 , 48 , 49 , 50 , 51 ].…”
Section: Methodsmentioning
confidence: 99%
“…In addition, the presence of less-represented classes has led the use of two techniques to synthesize data and, therefore, to help balance the classes (malignant and benign) giving the possibility to better train the considered classifiers: the self-adaptive synthetic over-sampling (SASYNO) approach and the adaptive synthetic sampling (ADASYN) approach. This allowed to boost the performance both overall and in terms of confusion matrix [ 47 , 48 , 49 , 50 , 51 ].…”
Section: Methodsmentioning
confidence: 99%
“…Then, it imposes a Gaussian disturbance on these data samples, and, finally, it generates synthetic samples by creating linear interpolations between these extrapolations. A further difference from our recent method [13] is that in this paper we use the standard deviation, σ as a radius of influence around the prototype rather than absolute distance of first order. We then augment the training data set with this synthetically generated data set as shown in Fig.…”
Section: B Balancing Classes Through Synthesising Training Data Stramentioning
confidence: 99%
“…We achieve this by synthetic data augmentation. In this paper we propose a different approach from our recently published one [13] for synthesising data for highly imbalanced classification problems. The differences are that in this paper we synthesise data around prototypes which makes these synthetic data more likely to have the same class as the prototype.…”
Section: B Balancing Classes Through Synthesising Training Data Stramentioning
confidence: 99%
“…Few‐shot learning : Few‐shot learning, based on meta‐learning, typically uses episodic training strategies 31,32 . In each episode, the model based on meta‐learning is trained on a meta‐task, which can be viewed as a classification task 33,34 . During training, the tasks were randomly selected from the training data set in the episodes.…”
Section: Related Workmentioning
confidence: 99%