2017
DOI: 10.1186/s12859-017-1578-z
|View full text |Cite
|
Sign up to set email alerts
|

CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests

Abstract: BackgroundThe random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization.ResultsWe propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
77
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 184 publications
(77 citation statements)
references
References 63 publications
0
77
0
Order By: Relevance
“…The recently proposed new re-sample methods [44] can be considered in future work. Furthermore, the feature extraction functions and the pre-trained classifiers of this method can be easily embedded into the LC-MS based quantitative proteomics analysis pipeline.…”
Section: Resultsmentioning
confidence: 99%
“…The recently proposed new re-sample methods [44] can be considered in future work. Furthermore, the feature extraction functions and the pre-trained classifiers of this method can be easily embedded into the LC-MS based quantitative proteomics analysis pipeline.…”
Section: Resultsmentioning
confidence: 99%
“…Moreover, the algorithm often fails due to storage and computation defects [49]. Finally, in the RF model, each tree randomly selects some samples and some features to avoid overfitting; consequently, the model features a good anti-noise ability and stable performance [50,51]. Furthermore, the RF model can handle very high-dimensional data and omit the work associated with feature selection [52].…”
Section: Discussionmentioning
confidence: 99%
“…The most well-known oversampling method is synthetic minority oversampling technique (SMOTE) proposed by Chawla et al [15]. The main idea of SMOTE [15][16][17][18][19] is to identify k minority class neighbors close to each minority class sample, then randomly select a point between the sample and its neighbors as the synthetic sample. But SMOTE produces new samples with certain blindness and may make class overlapping more serious.…”
Section: Related Workmentioning
confidence: 99%