2009
DOI: 10.1089/cmb.2008.0037
|View full text |Cite
|
Sign up to set email alerts
|

Use of Wrapper Algorithms Coupled with a Random Forests Classifier for Variable Selection in Large-Scale Genomic Association Studies

Abstract: Modern large-scale genetic association studies generate increasingly high-dimensional datasets. Therefore, some variable selection procedure should be performed before the application of traditional data analysis methods, for reasons of both computational efficiency and problems related to overfitting. We describe here a “wrapper” strategy (SIZEFIT) for variable selection that uses a Random Forests classifier, coupled with various local search/optimization algorithms. We apply it to a large dataset consisting … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 22 publications
(15 citation statements)
references
References 30 publications
(30 reference statements)
0
15
0
Order By: Relevance
“…Subsequently, a set of features producing the highest accuracy by cross-validation was identified as the optimal feature subset. Many previous studies preferred to select SVM as the learning scheme due to its superiority compared to the other classifiers [12,38], but the RF classifier has also recently been used [39]. Since RF and SVM classifiers were employed as the classification techniques tested in this study (see Section 2.4), we tested two wrapper methods, and the learning schemes were set to RF and SVM classifiers, respectively, to achieve the best possible classification performance for feature selection.…”
Section: (3) Svm Recursive Feature Elimination (Svm-rfe)mentioning
confidence: 99%
“…Subsequently, a set of features producing the highest accuracy by cross-validation was identified as the optimal feature subset. Many previous studies preferred to select SVM as the learning scheme due to its superiority compared to the other classifiers [12,38], but the RF classifier has also recently been used [39]. Since RF and SVM classifiers were employed as the classification techniques tested in this study (see Section 2.4), we tested two wrapper methods, and the learning schemes were set to RF and SVM classifiers, respectively, to achieve the best possible classification performance for feature selection.…”
Section: (3) Svm Recursive Feature Elimination (Svm-rfe)mentioning
confidence: 99%
“…Díaz-Uriarte and Alvarez de Andrés [2006] suggested removing the bottom 10% and re-running until prediction decreased. Rodin et al [2009] devised a method for selecting variables based on specification of optimal model size. Goldstein et al [2010] examined the scree plots of the VI measures and used the "elbow" as the cut-off.…”
Section: Determining Important Variablesmentioning
confidence: 99%
“…Random Forests, in particular, is a randomized decision tree ensemble that has attractive scalability properties (proportional to the square root of the number of variables) in the approximately 500,000–1 million variables range, which makes it very appealing to GWAS and similar analyses ([26,30,31], see also [32] for a recent overview). Numerous software implementations specifically aimed at the genomic data exist.…”
Section: Machine Learning Methodsmentioning
confidence: 99%