The 2010 International Joint Conference on Neural Networks (IJCNN) 2010
DOI: 10.1109/ijcnn.2010.5596787
|View full text |Cite
|
Sign up to set email alerts
|

Efficient resampling methods for training support vector machines with imbalanced datasets

Abstract: Random undersampling and oversampling are simple but well-known resampling methods applied to solve the problem of class imbalance. In this paper we show that the random oversampling method can produce better classification results than the random undersampling method, since the oversampling can increase the minority class recognition rate by sacrificing less amount of majority class recognition rate than the undersampling method. However, the random oversampling method would increase the computational cost as… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
51
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 85 publications
(51 citation statements)
references
References 11 publications
0
51
0
Order By: Relevance
“…One common approach in literature to overcome this challenge is to undersample the majority class to balance the class distribution. 49,50 Since a random undersampling approach could discard potentially useful data and cause further problems, we decided to employ a stratified sampling scheme based on k-medoids clustering 51 to attain a positive/negative class ratio of 1:1. As mentioned earlier, we used half of the positive and negative examples for training/validation and reserved the other half of the data for testing.…”
Section: Resultsmentioning
confidence: 99%
“…One common approach in literature to overcome this challenge is to undersample the majority class to balance the class distribution. 49,50 Since a random undersampling approach could discard potentially useful data and cause further problems, we decided to employ a stratified sampling scheme based on k-medoids clustering 51 to attain a positive/negative class ratio of 1:1. As mentioned earlier, we used half of the positive and negative examples for training/validation and reserved the other half of the data for testing.…”
Section: Resultsmentioning
confidence: 99%
“…The goal of our work was not to obtain the best possible results for splice site classification, which has already been successfully addressed by Sonnenburg et al (2007) using SVM and specialised kernels, but rather to explore semi-supervised learning as a possible solution for splice site prediction, and to study the effects of imbalanced distributions on semi-supervised learning algorithms. Batuwita and Palade applied SVM (Batuwita and Palade, 2010) with re-sampling methods on four imbalanced biological datasets and the Pageblocks dataset from the UCI, with up to 8K instances and no more than 1-to-50 imbalance degree. They proposed to first identify the most informative negative instances, and then randomly over-sample the positive instances in order to reach the same number of negative instances selected.…”
Section: Related Workmentioning
confidence: 99%
“…Multi-category data clustering suffered from multiple feature evaluation, Selection of number of cluster for multi-level [1,3,6] Diversity of feature selection process. [12] Boundary value of cluster [9,13] V.…”
Section: IVmentioning
confidence: 99%