2019
DOI: 10.1101/848176
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

To dereplicate or not to dereplicate?

Abstract: Our ability to reconstruct genomes from metagenomic datasets has rapidly evolved over the past decade, leading to publications presenting 1,000s, and even more than 100,000 metagenome-assembled genomes (MAGs) from 1,000s of samples. While this wealth of genomic data is critical to expand our understanding of microbial diversity, evolution, and ecology, various issues have been observed in some of these datasets that risk obfuscating scientific inquiry. In this perspective we focus on the issue of identical or … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…genomes [67]. However, since this issue would likely only underestimate the abundances of each clade, we report that T. thiebautii MAGs were recruiting at least 1-2 orders of magnitude more reads than T. erythraeum from TriCoLim colonies.…”
Section: One-quarter Of All Trichodesmium Mags Have Shared Gene Clustersmentioning
confidence: 70%
“…genomes [67]. However, since this issue would likely only underestimate the abundances of each clade, we report that T. thiebautii MAGs were recruiting at least 1-2 orders of magnitude more reads than T. erythraeum from TriCoLim colonies.…”
Section: One-quarter Of All Trichodesmium Mags Have Shared Gene Clustersmentioning
confidence: 70%
“…Intra-clade average nucleotide identity (ANI) of the MAGs was very high. Thus, in situ quantification of each was not possible because of likely random read recruiting among high ANI genomes (55). However, since this issue would likely only underestimate the abundances of each clade, we report that T. thiebautii MAGs were recruiting at least 1-2 orders of magnitude more reads than T. erythraeum from TriCoLim colonies.…”
Section: Resultsmentioning
confidence: 99%
“…Dereplication dramatically simplifies downstream analysis when the input genomes come from different sources. 17 In the proposed workflow, filtered genomes (genomes that pass completeness, contamination and GUNC filters) are optionally dereplicated using dRep. 18 For each cluster, dRep reports, as the cluster representative, the best-scoring MAG using the CheckM's quality estimates.…”
Section: Methodsmentioning
confidence: 99%
“…The workflow includes three Python3 custom scripts, designed to manipulate the output of the different steps. The scripts make use of NumPy, 17 Pandas and scikit-learn libraries.…”
Section: Methodsmentioning
confidence: 99%