2020
DOI: 10.1186/s13059-020-02066-4
|View full text |Cite
|
Sign up to set email alerts
|

Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity

Abstract: Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

2
65
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 39 publications
(68 citation statements)
references
References 54 publications
2
65
0
Order By: Relevance
“…Taxonomy of the bins was assigned using the Genome Taxonomy Database (GTDB-Tk) on KBaseGhostKOALA v2.2 and Prokka v1.11 were used to annotate genes in the cyanobacterial bin of interest [ 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 ]. To refine the bin, spacegraphcats was used to extract additional content of the bin with a k size of 21 [ 65 ]. The code used for the analyses presented here is available at , accessed on 1 March 2021.…”
Section: Methodsmentioning
confidence: 99%
“…Taxonomy of the bins was assigned using the Genome Taxonomy Database (GTDB-Tk) on KBaseGhostKOALA v2.2 and Prokka v1.11 were used to annotate genes in the cyanobacterial bin of interest [ 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 ]. To refine the bin, spacegraphcats was used to extract additional content of the bin with a k size of 21 [ 65 ]. The code used for the analyses presented here is available at , accessed on 1 March 2021.…”
Section: Methodsmentioning
confidence: 99%
“…To summarise, RECAST algorithm offers an improved approach for analysis of FMT experiments metagenomic data which allows researchers to gain novel biological insights. The idea of reads sorting prior to the comparison of metagenomes with the common computational approaches enables to preserve more data by presenting it unchanged from, for example, compared to the metagenomic assembly, which can cause the loss of a significant part of information [Olekhnovich et al, 2018;Brown et al, 2020]. Using the RECAST algorithm, not only FMT data such as [Draper et al, 2018] can be analyzed, but also, for example, the colonization of the organism of children by the microbiota of mothers [Ferretti et al, 2018] or others.…”
Section: Discussionmentioning
confidence: 99%
“…Fortunately, workflow system coordination alleviates the need for a user to directly manage file interdependencies. For a larger analysis DAG, see [48]…”
Section: Workflow-based Project Managementmentioning
confidence: 99%