2021
DOI: 10.1101/2021.01.23.426502
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Long-read metagenomics of soil communities reveals phylum-specific secondary metabolite dynamics

Abstract: Microbial biosynthetic gene clusters (BGCs) encoding secondary metabolites are thought to impact a plethora of biologically mediated environmental processes, yet their discovery and functional characterization in natural microbiomes remains challenging. Here we describe deep long-read sequencing and assembly of metagenomes from biological soil crusts, a group of soil communities that are rich in BGCs. Taking advantage of the unusually long assemblies produced by this approach, we recovered nearly 3,000 BGCs fo… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(17 citation statements)
references
References 67 publications
0
17
0
Order By: Relevance
“…For other environments such as brackish and lake waters, our work highlights that using the marky‐coco pipeline based on a single assembly approach provide similar results to a coassembly approach in detecting hgc genes. Long‐read metagenomic sequencing could help reduce discrepancies between coassembly and single assembly approaches (Driscoll et al, 2017; Van Goethem et al, 2021). However, long‐read approaches require high quality intact DNA and come with a trade‐off in base‐call accuracy and assembly coverage.…”
Section: Discussionmentioning
confidence: 99%
“…For other environments such as brackish and lake waters, our work highlights that using the marky‐coco pipeline based on a single assembly approach provide similar results to a coassembly approach in detecting hgc genes. Long‐read metagenomic sequencing could help reduce discrepancies between coassembly and single assembly approaches (Driscoll et al, 2017; Van Goethem et al, 2021). However, long‐read approaches require high quality intact DNA and come with a trade‐off in base‐call accuracy and assembly coverage.…”
Section: Discussionmentioning
confidence: 99%
“…A complete list of the counts are available in Supplemental Table S3. The hybrid approach also predicted more complete gene clusters (i.e., it is not truncated on either of the contig edges) than the assembly-only approach, 1,100 vs 712 (Van Goethem et al, 2021). The longest NRPS is novel (based on sequence similarity to the entire NCBI nr database) and is a full-length gene cluster of 79,925 bp.…”
Section: Metagenome Hybrid Clusteringmentioning
confidence: 97%
“…Biocrusts are specialized microbial communities consisting of primary producers, such as cyanobacteria, mosses, and lichens, and associated heterotrophs. They are aggregated organosedimentary communities that colonize and stabilize the soil surfaces of arid environments, preventing soil erosion and promoting nutrient status by fixing both atmospheric carbon and nitrogen (Van Goethem et al, 2021). The two ends of a Illumina short-read pair are 151 and 150bp.…”
Section: Datasets and Data Preprocessingmentioning
confidence: 99%
“…One caveat of such analyses is that annotated BGCs often have incomplete sequences, potentially impacting annotation and quantification 3 . More importantly, gene-level data about BGCs inferred from MAGs cannot offer information about actual synthesis (e.g., gene expression), creating uncertainty about the distribution of secondary metabolites across environments 69 . Even with high-coverage gene expression data, currently lacking for most environments, the complex structural and modular nature of many secondary metabolites prevents their accurate association with the underlying genomic origins 10 .…”
mentioning
confidence: 99%
“…Importantly, gene-level data about BGCs inferred from MAGs fundamentally cannot provide information about actual synthesis (e.g., gene expression, enzyme activity and substrate availability), and cannot be used to infer the distribution of secondary metabolites in natural communities 2023 . Even with high-coverage gene expression data, currently lacking for most environments, the complex structural- and modular nature of many secondary metabolites prevents their accurate association with underlying genomic elements 24 .…”
mentioning
confidence: 99%