A data reduction approach for resolving the imbalanced data issue in functional genomics

Yoon, Ki-Hong; Kwek, Stephen

doi:10.1007/s00521-007-0089-7

Cited by 44 publications

(19 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…when positive examples are significantly fewer than those negative, and imbalance-aware strategies are required [16,67,35,34]. Unfortunately, several graph-based prediction problems are characterized by strongly unbalanced labelings [13,53,21].…”

Section: Introductionmentioning

confidence: 99%

Learning node labels with multi-category Hopfield networks

Frasca

Bassis

Valentini

2015

Neural Comput & Applic

View full text Add to dashboard Cite

In several real-world node-label prediction problems on graphs, in fields ranging from computational biology to World-Wide-Web analysis, nodes can be partitioned into categories different from the classes to be predicted, on the basis of their characteristics or their common properties. Such partitions may provide further information about node classification that classical machine learning algorithms do not take into account. We introduce a novel family of parametric Hopfield networks (m-Category Hopfield Networks) and a novel algorithm (Hopfield Multi-Category -HoMCat), designed to appropriately exploit the presence of propertybased partitions of nodes into multiple categories. Moreover, the proposed model adopts a cost-sensitive learning strategy to prevent the remarkable decay in performance usually observed when instance labels are unbalanced, that is when one class of labels is highly under-represented than the other one. We validate the proposed model on both synthetic and real-world data, in the context of multi-species function prediction, where the classes to be predicted are the Gene Ontology terms and the categories the different species in the multi-species protein network. We carried out an intensive experimental validation, which on the one hand compares HoMCat with several state-of-the-art graph-based algorithms, and on the other hand reveals that exploiting meaningful prior partitions of input data can substantially improve classification performances.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning node labels with multi-category Hopfield networks

Frasca

Bassis

Valentini

2015

Neural Comput & Applic

View full text Add to dashboard Cite

show abstract

“…Chawla et al [16] proposed the synthetic minority oversampling technique (SMOTE) algorithm in which the minority class was oversampled by taking each minority class sample and introducing new synthetic examples joining any or all the minority class nearest neighbors. Further, algorithms that combine SMOTE and other learning methods have also been applied to solve the class imbalance problem [17][18][19].…”

Section: Introductionmentioning

confidence: 99%

Imbalanced classification using support vector machine ensemble

Jiang

Liu

2010

Neural Comput & Applic

View full text Add to dashboard Cite

Imbalanced data sets often have detrimental effects on the performance of a conventional support vector machine (SVM). To solve this problem, we adopt both strategies of modifying the data distribution and adjusting the classifier. Both minority and majority classes are resampled to increase the generalization ability. For minority class, an one-class support vector machine model combined with synthetic minority oversampling technique is used to oversample the support vector instances. For majority class, we propose a new method to decompose the majority class into clusters and remove two clusters using a distance measure to lessen the effect of outliers. The remaining clusters are used to build an SVM ensemble with the oversampled minority patterns, the SVM ensemble can achieve better performance by considering potentially suboptimal solutions. Experimental results on benchmark data sets are provided to illustrate the effectiveness of the proposed method.

show abstract

“…A problem that we focus on this paper is how to learn a classification task from imbalanced data sets. The class imbalance problem, which is one of the fundamental problems in machine learning, has received much attention recently [1,4,9,14,15]. In many real-world diagnostic applications, e.g., computer security, biomedical, and engineering, uneven distribution of data patterns is very common, where number of training instances of a minority class is much smaller compared to other majority classes; as a result, the classifier tends to favor the majority class [7].…”

Section: Introductionmentioning

confidence: 99%

“…In general, learning algorithms for class imbalance problems can be divided into two categories: resampling and cost-sensitive based. Resampling meth-978-1-4244-2175-6/08/$25.00 ©2008 IEEE ods such as over-sampling and under-sampling [7,14] modify the prior probability of the majority and minority class in the training set to obtain a more balanced number of instances in each class. The undersampling method extracts a smaller set of majority instances while preserving all the minority instances.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A supervised learning approach for imbalanced data sets

Nguyen

Bouzerdoum

Phung

2008

2008 19th International Conference on Pattern Recognition

View full text Add to dashboard Cite

This paper presents a new learning approach for pattern classification applications involving imbalanced data sets. In this approach, a clustering technique is employed to resample the original training set into a smaller set of representative training exemplars, represented by weighted cluster centers and their target outputs. Based on the proposed learning approach, four training algorithms are derived for feed-forward neural networks. These algorithms are implemented and tested on three benchmark data sets. Experimental results show that with the proposed learning approach, it is possible to design networks to tackle the class imbalance problem, without compromising the overall classification performance. Disciplines Physical Sciences and Mathematics

show abstract

A data reduction approach for resolving the imbalanced data issue in functional genomics

Cited by 44 publications

References 8 publications

Learning node labels with multi-category Hopfield networks

Learning node labels with multi-category Hopfield networks

Imbalanced classification using support vector machine ensemble

A supervised learning approach for imbalanced data sets

Contact Info

Product

Resources

About