2022
DOI: 10.1371/journal.pone.0275790
|View full text |Cite
|
Sign up to set email alerts
|

Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers

Abstract: The use of high-throughput sequencing to recover short DNA reads of many species has been widely applied on biodiversity studies, either as amplicon metabarcoding or shotgun metagenomics. These reads are assigned to taxa using classifiers. However, for different reasons, the results often contain many false positives. Here we focus on the reduction of false positive species attributable to the classifiers. We benchmarked two popular classifiers, BLASTn followed by MEGAN6 (BM) and Kraken2 (K2), to analyse shotg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 53 publications
0
5
0
Order By: Relevance
“…Without purposefully generated reference data for eDNA at hand for the surveyed region, we used a comprehensive public source of reference information, receiving data from many other initiatives, including 12S sequences (Collins et al, 2021;Pruesse et al, 2007), a currently frequently used fish primer set (e.g., see a recent evaluation in Zhu & Iwasaki, 2023), and relaxed taxonomic assignment parameters, to obtain the highest possible yield in identified eDNA species while minimizing missing assignments due to a limited search space (Gold et al, 2022). We did so at the cost of obtaining many less accurate eDNA assignments, which we inspected carefully, also due to shortcomings of taxonomic assignment algorithms (Garrido-Sanz et al, 2022;Somervuo et al, 2017). Our taxonomic assignments derived from eDNA are likely also influenced by transport and diffusion phenomena of suspended genetic material in the water column, when compared to other observation methods-marine eDNA can be transported over distances ranging to tens of kilometers and can persist for up to 2 weeks at low temperatures, with transport and diffusion playing a role in detectability (Andruszkiewicz et al, 2019;McCartin et al, 2022).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Without purposefully generated reference data for eDNA at hand for the surveyed region, we used a comprehensive public source of reference information, receiving data from many other initiatives, including 12S sequences (Collins et al, 2021;Pruesse et al, 2007), a currently frequently used fish primer set (e.g., see a recent evaluation in Zhu & Iwasaki, 2023), and relaxed taxonomic assignment parameters, to obtain the highest possible yield in identified eDNA species while minimizing missing assignments due to a limited search space (Gold et al, 2022). We did so at the cost of obtaining many less accurate eDNA assignments, which we inspected carefully, also due to shortcomings of taxonomic assignment algorithms (Garrido-Sanz et al, 2022;Somervuo et al, 2017). Our taxonomic assignments derived from eDNA are likely also influenced by transport and diffusion phenomena of suspended genetic material in the water column, when compared to other observation methods-marine eDNA can be transported over distances ranging to tens of kilometers and can persist for up to 2 weeks at low temperatures, with transport and diffusion playing a role in detectability (Andruszkiewicz et al, 2019;McCartin et al, 2022).…”
Section: Discussionmentioning
confidence: 99%
“…To evaluate eDNA taxonomic assignments obtained without local reference data, while keeping in mind potential misassignments after using relaxed taxonomic assignment parameters, we initially compared our taxonomic assignments to those obtained with MEGAN's (6.24.21) Least Common Ancestor (LCA) algorithm (Huson et al., 2007, 2016) and the same BLAST output files, expecting more, but partially less reliable assignments of our assignment method, minding that also MEGAN can yield false annotations if limited reference data are available (Garrido‐Sanz et al., 2022; Somervuo et al., 2017). Unlike many other studies, we also inspected alignment qualities, expecting them to be highly variable.…”
Section: Methodsmentioning
confidence: 99%
“…In total, more than 7,000 organisms are represented in the database, with diverse representation of microbial species and strains. Despite any degree of database curation, metagenomics tools often result in spurious, false positive hits, which has historically made it difficult to use NGS in a clinical diagnostic setting (38)(39)(40)(41). To distinguish between clinically relevant infections, off-target spurious hits by the metagenomics software, or low levels of cross-contamination, we designed a classifier using ML that estimates the probability that a microbial organism is present in a sample.…”
Section: Clinical-grade Metagenomic Sequencing For Infectious Disease...mentioning
confidence: 99%
“…Controlling false positives remains a major challenge in environmental DNA analysis [ 64 ]. Misidentification hinders the reliability of DNA-based assessments of biodiversity [ 65 ]. A balanced discussion with consistent communication, controls, and limits of detection to clarify false positives is important for resolving misconceptions about false positives [ 66 ].…”
Section: Introductionmentioning
confidence: 99%