2021
DOI: 10.3389/fmicb.2020.607325
|View full text |Cite
|
Sign up to set email alerts
|

Effects of Rare Microbiome Taxa Filtering on Statistical Analysis

Abstract: Background: The accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Removing contaminants and filtering rare features are two common approaches to deal with this problem. While contaminant detection methods use auxiliary sequencing process information to identify known contaminants, filtering m… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
90
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 102 publications
(90 citation statements)
references
References 39 publications
0
90
0
Order By: Relevance
“…Moreover, we show that prevalence filtering can have additional disadvantages such as altering sample diversity rankings in a manner that may not be consistent across studies. Therefore, whilst filtering rare taxa can reduce technical bias ( Cao et al, 2021 ), heavier filtering parameters may not always generate ecologically meaningful results that are comparable across the literature (although see Ainsworth et al, 2015 ; Grieneisen et al, 2017 ; Russell et al, 2019 ), and variation in sequencing depth after filtering may also bias weighted alpha and beta diversity scores unless further normalization methods are applied (e.g., Silverman et al, 2017 ; Beule and Karlovsky, 2020 ). Our results indicate that diversity measures that account for both abundance and phylogeny (BWPD and Weighted Unifrac, for alpha and beta diversity, respectively) are insensitive to prevalence thresholds, and therefore represent the common core microbiome without the need for filtering.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, we show that prevalence filtering can have additional disadvantages such as altering sample diversity rankings in a manner that may not be consistent across studies. Therefore, whilst filtering rare taxa can reduce technical bias ( Cao et al, 2021 ), heavier filtering parameters may not always generate ecologically meaningful results that are comparable across the literature (although see Ainsworth et al, 2015 ; Grieneisen et al, 2017 ; Russell et al, 2019 ), and variation in sequencing depth after filtering may also bias weighted alpha and beta diversity scores unless further normalization methods are applied (e.g., Silverman et al, 2017 ; Beule and Karlovsky, 2020 ). Our results indicate that diversity measures that account for both abundance and phylogeny (BWPD and Weighted Unifrac, for alpha and beta diversity, respectively) are insensitive to prevalence thresholds, and therefore represent the common core microbiome without the need for filtering.…”
Section: Discussionmentioning
confidence: 99%
“…Quality filtering by excluding very low prevalence taxa (e.g., that occur in just a few samples) reduces effects of sequencing error (Bokulich et al, 2013;Callahan et al, 2016;Amir et al, 2017), although it is argued this method also excludes rare yet real taxa and therefore can bias results in other ways (Kozich et al, 2013;Jousset et al, 2017;Schloss, 2020). Statistical filtering, on the other hand, removes rare yet resident taxa (i.e., they are not due to sequencing error) and is recommended for many analyses, such as network and differential abundance analysis to increase their reliability (Röttjers and Faust, 2018;Cougoul et al, 2019;Cao et al, 2021). Statistical filters generally apply higher prevalence thresholds than quality filters, with taxa below ∼20% prevalence often being limited in their statistical testability, although this number is dependent on sample size (Cougoul et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, we focused on the analysis of DNA metabarcoding of pollen in the framework of plant-pollinator interactions, being aware that the outputs of our investigation could be extended to the other typologies of DNA metabarcoding-based studies. Although the issue of removing false positives and rare taxa or features is quite neglected in the scientific literature in relation to the bioinformatic pipeline (but see [28,30]), the choices made when analysing a HTS output could generate relevant effects on the community composition, species richness, and species interactions. These aspects would deeply impact the ecological outcomes of the investigated system.…”
Section: Discussionmentioning
confidence: 99%
“…Given the extreme sensitivity of DNA metabarcoding, it is crucial to filter out false positives and contaminants, which could significantly alter the reconstruction of samples composition. Moreover, rare features or taxa should be treated consciously during the postsequencing bioinformatics processing and possibly removed, depending on the study aims and the required sensitivity of the analysis [27][28][29][30]. However, the resulting species composition of a sample could be biased by the disapplication or misapplication of cut-off thresholds.…”
Section: Introductionmentioning
confidence: 99%
“…Similar to existing studies [24][25][26][27][28] , 16S rRNA gene (V3-V4 region) sequencing was used to investigate the microbiota content, although it may not be possible to achieve specieslevel differentiation. We formalized the task of CRC or adenoma prediction as a classification problem and focused on operational classification units (OTUs).…”
Section: /31mentioning
confidence: 99%