2021
DOI: 10.1145/3510834
|View full text |Cite
|
Sign up to set email alerts
|

Distance-based Probabilistic Data Augmentation for Synthetic Minority Oversampling

Abstract: Class imbalance can adversely affect the performance of machine learning for prediction and classification. One approach to address the class imbalance problem is synthetic minority oversampling. Oversampling approaches can be broadly categorized as either being structural or statistical in nature. Structural approaches generally have the advantage of identifying and oversampling those minority data points that best facilitate class separation, while statistical approaches model the underlying distribution fro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 46 publications
0
3
0
Order By: Relevance
“…Decomposition methods feature using decomposition techniques such as empirical mode decomposition to generate new data that preserves the information of the original data [93]. Statistical generative methods employ modeling the dynamics of data using statistical models based on the data [94]. Learning-based methods feature modeling the dynamics of data, primarily through AI techniques such as GAN [95].…”
Section: A Uncertainty Of Informationmentioning
confidence: 99%
“…Decomposition methods feature using decomposition techniques such as empirical mode decomposition to generate new data that preserves the information of the original data [93]. Statistical generative methods employ modeling the dynamics of data using statistical models based on the data [94]. Learning-based methods feature modeling the dynamics of data, primarily through AI techniques such as GAN [95].…”
Section: A Uncertainty Of Informationmentioning
confidence: 99%
“…It is clear that the sample associated with C 1 landing within the boundary of the closed The mechanics of candidate acceptance partially mirrors that of candidate generation, i.e., we look to find candidates whose kNN distance to the training set is sufficiently greater than the kNN distances within the training set. The mathematical justification for this observation [40] follows from divergence between the distribution of closed set sample f (X ) and the distribution of synthetic anomalies f (A)…”
Section: A Candidate Synthetic Anomaly Generationmentioning
confidence: 99%
“…However, this approach may result in over-fitting by over-emphasizing noisy minority samples. The second approach for increasing the number of minority class samples is to generate new synthetic minority class samples (Abd Elrahman & Abraham, 2013;Chawla et al, 2002;Wan et al, 2017;Goodman et al, 2022).…”
Section: Introductionmentioning
confidence: 99%