2011
DOI: 10.1504/ijkesdp.2011.039875
|View full text |Cite
|
Sign up to set email alerts
|

Borderline over-sampling for imbalanced data classification

Abstract: Abstract-Traditional classification algorithms, in many times, perform poorly on imbalanced data sets in which some classes are heavily outnumbered by the remaining classes. For this kind of data, minority class instances, which are usually much more of interest, are often misclassified. The paper proposes a method to deal with them by changing class distribution through oversampling at the borderline between the minority class and the majority class of the data set. A Support Vector Machines (SVMs) classifier… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
206
0
4

Year Published

2012
2012
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 460 publications
(236 citation statements)
references
References 26 publications
0
206
0
4
Order By: Relevance
“…A more complex over-sampling technique interpolates synthetic minority instances between two existing ones [3]. Some studies have found that over-sampling of the minority class in borderline regions can provide better results [6], [10]. The simplest under-sampling technique is to randomly remove a number of majority instances, while a more intelligent technique [16] discards only those majority instances that are redundant, borderline, or noisy.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A more complex over-sampling technique interpolates synthetic minority instances between two existing ones [3]. Some studies have found that over-sampling of the minority class in borderline regions can provide better results [6], [10]. The simplest under-sampling technique is to randomly remove a number of majority instances, while a more intelligent technique [16] discards only those majority instances that are redundant, borderline, or noisy.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Standard learning algorithms usually make a bias toward the majority class to increase overall accuracy, and therefore reducing the predictive accuracy on the minority class. The most popular approach for handling the class imbalance problem is to rebalance the training set using sampling techniques [1]- [10]. An advantage of sampling is that we can simply apply a standard learning algorithm to a rebalanced training set, without a need to modify that algorithm.…”
Section: Introductionmentioning
confidence: 99%
“…SMOTE introduces artificial instances in data sets by interpolating features values based on neighbors. In several studies have been shown that SMOTE is better than under-sampling and over-sampling techniques [3][4][5][6][7]. Moreover, SMOTE not cause any information loss and could potentially find hidden minority regions, because SMOTE identify similar but more specific regions in the feature space as the decision region for the minority class.…”
Section: Introductionmentioning
confidence: 99%
“…However, the classes in general cannot be assumed to be convex, and hence the SMOTE does not avoid synthetic patterns to fall inside majority regions, therefore, more careful techniques have been developed to prevent this issue (prevent, but not solve). Adaptive synthetic [5]- [7] and cluster-based sampling methods [8], [9] are examples of more powerful techniques, based on extracting knowledge from the data to analyze which patterns and regions of the space are more suitable for oversampling. This will be referred in this paper to as preferential oversampling.…”
mentioning
confidence: 99%
“…Ideally, a better fitted kernel will increase the class separability, providing a safer environment for the generation of synthetic patterns. The last part of this paper proposes a unified adaptive framework for preferential oversampling generalizing several oversampling approaches in the literature [3], [5], [6]. The optimal SVM hyperplane and kernel learning techniques are used for optimizing the synthetically generated patterns.…”
mentioning
confidence: 99%