2010
DOI: 10.18637/jss.v036.i11
|View full text |Cite
|
Sign up to set email alerts
|

Feature Selection with theBorutaPackage

Abstract: This article describes a R package Boruta, implementing a novel feature selection algorithm for finding all relevant variables. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
2,458
0
13

Year Published

2015
2015
2023
2023

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 3,758 publications
(2,474 citation statements)
references
References 9 publications
3
2,458
0
13
Order By: Relevance
“…Predictive gene families that segregated significantly between soil layers or among harvesting treatments were identified using random forest analysis (Breiman, 2001) with 1000 trees followed by the Boruta algorithm for feature selection (Kursa and Rudnicki, 2010). Gene families were characterized using information from the CAZy website obtained on 15 August 2014 and previous publications on cellulases (Berlemont and Martiny, 2012) and hemicellulases (Shallom and Shoham, 2003;Zhao et al, 2013).…”
Section: Resultsmentioning
confidence: 99%
“…Predictive gene families that segregated significantly between soil layers or among harvesting treatments were identified using random forest analysis (Breiman, 2001) with 1000 trees followed by the Boruta algorithm for feature selection (Kursa and Rudnicki, 2010). Gene families were characterized using information from the CAZy website obtained on 15 August 2014 and previous publications on cellulases (Berlemont and Martiny, 2012) and hemicellulases (Shallom and Shoham, 2003;Zhao et al, 2013).…”
Section: Resultsmentioning
confidence: 99%
“…The wrapper function of RF identifies relevant parameters by performing multiple runs of the provided classification factors that test the performance of different subsets of the input parameters [45]. The RF function is a suitable alternative for the analysis of soil parameters at different depths, as it can be employed without extensive parameter tuning and returns an estimate of the feature's importance (Z-score) [46]. This parameter selection method is different from other dimensionality reduction techniques, such as PCA or other methods.…”
Section: Selection Of Soil Parameters Using the Random Forest Classifiermentioning
confidence: 99%
“…The optimization of the parameters chosen in the RF classifier was according to the overall accuracy and the efficiency ( Figure A2). In order to assess the importance of input variables in the classification, we applied the Boruta algorithm for feature selection [45] via an iteration process (R package 'Boruta'). In the post classification, we manually filled the gaps due to residuals of cloud cover in mountain peaks by digitizing from the high-resolution aerial photos.…”
Section: Land Cover Mapping Post Classification and Land Change Anamentioning
confidence: 99%