2010
DOI: 10.3233/fi-2010-288
|View full text |Cite
|
Sign up to set email alerts
|

Boruta – A System for Feature Selection

Abstract: Machine learning methods are often used to classify objects described by hundreds of attributes; in many applications of this kind a great fraction of attributes may be totally irrelevant to the classification problem. Even more, usually one cannot decide a priori which attributes are relevant. In this paper we present an improved version of the algorithm for identification of the full set of truly important variables in an information system. It is an extension of the random forest method which utilises the i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
358
0
3

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 579 publications
(361 citation statements)
references
References 23 publications
0
358
0
3
Order By: Relevance
“…Each category (annotated transcripts, lincRNAs and lncoaRNAs) of potential candidates passing the first differentiation expression filter were separated for feature selection analysis. Boruta 6.0 [28] was used with 10000 maximum runs and a pvalue of 0.01 on each category, with multiple comparisons adjustment using the Bonferroni method (mcAdj = TRUE). Candidates passing the boruta test as "Confirmed" for each category were selected as reliable biomarkers.…”
Section: Quantification With Pseudoalignment and Feature Selectionmentioning
confidence: 99%
“…Each category (annotated transcripts, lincRNAs and lncoaRNAs) of potential candidates passing the first differentiation expression filter were separated for feature selection analysis. Boruta 6.0 [28] was used with 10000 maximum runs and a pvalue of 0.01 on each category, with multiple comparisons adjustment using the Bonferroni method (mcAdj = TRUE). Candidates passing the boruta test as "Confirmed" for each category were selected as reliable biomarkers.…”
Section: Quantification With Pseudoalignment and Feature Selectionmentioning
confidence: 99%
“…Machine-learning model development started with feature selection using an R package, Boruta, a convenient interface to the algorithm, in which a panel of relevant cytokines was selected via the Boruta algorithm. 30,31 The Boruta method follows an all-relevant feature selection method, whereby it captures all features that are strongly or weakly relevant to the outcome variable clearance subgroups. The feature selection reduced the number of features during machine-learning model development, which potentially reduced overfitting and improved model generalizability.…”
Section: Machine-learning Model Developmentmentioning
confidence: 99%
“…Firstly, using Spearman rank correlation coefficients, we eliminated bioclimatic variables with the highest and most significant correlation coefficients (|r| > 0.8 and p < 0.001) (Supporting information Figure S1). Then, Boruta, a wrapper built around the random forest classification algorithm implemented in the R, was used to select variables according to the relative importance of bioclimatic variables (Supporting information Figure S2) (Kursa, Jankowski, & Rudnicki, 2010 The spatial resolution of all variables was resampled into 1km to match those of the environmental variables (Supporting information Table S1 and S2) with the nearest-neighbor approach in ArcGIS 10.2.…”
Section: Data Sourcesmentioning
confidence: 99%