2020
DOI: 10.1101/2020.10.29.361360
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Forward variable selection improves the power of random forest for high- dimensional microbiome data

Abstract: A central focus of microbiome studies is the characterization of differences in the microbiome composition across groups of samples. A major challenge is the high dimensionality of microbiome datasets, which significantly reduces the power of current approaches for identifying true differences and increases the chance of false discoveries. We have developed a new framework to address these issues by combining (i) identifying a few significant features by a massively parallel forward variable selection procedur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 61 publications
0
1
0
Order By: Relevance
“…The taxonomic composition of the samples was visualized, reflecting the names and relative abundances of the most abundant genera for samples averaged by sample type, on barplots built in the fantaxtic package [68]. We used the Random Forest classifier [69,70] in the MicrobiomeAnalyst web-service [71] to identify taxa important to particular sample compositions. For each pairwise comparison, a constant random seed (123456) was used and 500 decision trees were constructed to draw the final result.…”
Section: Library Preparation and Sequencingmentioning
confidence: 99%
“…The taxonomic composition of the samples was visualized, reflecting the names and relative abundances of the most abundant genera for samples averaged by sample type, on barplots built in the fantaxtic package [68]. We used the Random Forest classifier [69,70] in the MicrobiomeAnalyst web-service [71] to identify taxa important to particular sample compositions. For each pairwise comparison, a constant random seed (123456) was used and 500 decision trees were constructed to draw the final result.…”
Section: Library Preparation and Sequencingmentioning
confidence: 99%