2015
DOI: 10.1016/j.knosys.2015.05.027
|View full text |Cite
|
Sign up to set email alerts
|

ROSEFW-RF: The winner algorithm for the ECBDL’14 big data competition: An extremely imbalanced big data bioinformatics problem

Abstract: The application of data mining and machine learning techniques to biological and biomedicine data continues to be an ubiquitous research theme in current bioinformatics. The rapid advances in biotechnology are allowing us to obtain and store large quantities of data about cells, proteins, genes, etc, that should be processed. Moreover, in many of these problems such as contact map prediction, the problem tackled in this paper, it is difficult to collect representative positive examples. Learning under these ci… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
70
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 128 publications
(70 citation statements)
references
References 36 publications
0
70
0
Order By: Relevance
“…Modern systems generate massive amounts of information that forces us to develop computationally effective solutions for processing them. Big data can also be affected by class imbalance, posing increased challenge to learning systems [55]. Not only the increasing data volume can become prohibitive for existing methods, but also the nature of problem can cause additional difficulties.…”
Section: Imbalanced Big Datamentioning
confidence: 99%
“…Modern systems generate massive amounts of information that forces us to develop computationally effective solutions for processing them. Big data can also be affected by class imbalance, posing increased challenge to learning systems [55]. Not only the increasing data volume can become prohibitive for existing methods, but also the nature of problem can cause additional difficulties.…”
Section: Imbalanced Big Datamentioning
confidence: 99%
“…It comes from the Evolutionary Big Data Competition ECBDL'14 [24], [25]. For this study, we consider a subset of 10% of the instances, in which the number of features was reduced from 631 to 90 by means of the feature selection algorithm applied in [25].…”
Section: Preliminary Results and Discussionmentioning
confidence: 99%
“…It comes from the Evolutionary Big Data Competition ECBDL'14 [24], [25]. For this study, we consider a subset of 10% of the instances, in which the number of features was reduced from 631 to 90 by means of the feature selection algorithm applied in [25]. This dataset contains a total of 3,489,083 instances, from which 69,133 In our experiments we consider a 5-fold stratified crossvalidation model, meaning that we construct 5 random partitions of each dataset maintaining the prior probabilities of each class.…”
Section: Preliminary Results and Discussionmentioning
confidence: 99%
“…The ROS pre-processing method for big data made SVM effective for pairwise ortholog detection and improved the performance of Random Forest for big data even more with a higher value for the resampling size parameter of 130% [41]. Conversely, the experiments showed that the variation in this parameter value from 100 to 130% did not significantly influence on the performance of the SVM big data classifier with different regulation values.…”
Section: Discussionmentioning
confidence: 99%