2007
DOI: 10.1007/s00521-007-0089-7
|View full text |Cite
|
Sign up to set email alerts
|

A data reduction approach for resolving the imbalanced data issue in functional genomics

Abstract: Learning from imbalanced data occurs frequently in many machine learning applications. One positive example to thousands of negative instances is common in scientific applications. Unfortunately, traditional machine learning techniques often treat rare instances as noise. One popular approach for this difficulty is to resample the training data. However, this results in high false positive predictions. Hence, we propose preprocessing training data by partitioning them into clusters. This greatly reduces the im… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
19
0

Year Published

2008
2008
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 44 publications
(19 citation statements)
references
References 8 publications
0
19
0
Order By: Relevance
“…when positive examples are significantly fewer than those negative, and imbalance-aware strategies are required [16,67,35,34]. Unfortunately, several graph-based prediction problems are characterized by strongly unbalanced labelings [13,53,21].…”
Section: Introductionmentioning
confidence: 99%
“…when positive examples are significantly fewer than those negative, and imbalance-aware strategies are required [16,67,35,34]. Unfortunately, several graph-based prediction problems are characterized by strongly unbalanced labelings [13,53,21].…”
Section: Introductionmentioning
confidence: 99%
“…Chawla et al [16] proposed the synthetic minority oversampling technique (SMOTE) algorithm in which the minority class was oversampled by taking each minority class sample and introducing new synthetic examples joining any or all the minority class nearest neighbors. Further, algorithms that combine SMOTE and other learning methods have also been applied to solve the class imbalance problem [17][18][19].…”
Section: Introductionmentioning
confidence: 99%
“…A problem that we focus on this paper is how to learn a classification task from imbalanced data sets. The class imbalance problem, which is one of the fundamental problems in machine learning, has received much attention recently [1,4,9,14,15]. In many real-world diagnostic applications, e.g., computer security, biomedical, and engineering, uneven distribution of data patterns is very common, where number of training instances of a minority class is much smaller compared to other majority classes; as a result, the classifier tends to favor the majority class [7].…”
Section: Introductionmentioning
confidence: 99%
“…In general, learning algorithms for class imbalance problems can be divided into two categories: resampling and cost-sensitive based. Resampling meth-978-1-4244-2175-6/08/$25.00 ©2008 IEEE ods such as over-sampling and under-sampling [7,14] modify the prior probability of the majority and minority class in the training set to obtain a more balanced number of instances in each class. The undersampling method extracts a smaller set of majority instances while preserving all the minority instances.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation