Mapping gene clusters within arrayed metagenomic libraries to expand the structural diversity of biomedically relevant natural products

Owen, Jeremy G.; Reddy, Boojala Vijay B.; Ternei, Melinda A.; Charlop–Powers, Zachary; Calle, Paula Y.; Kim, Jeffrey H.; Brady, Sean F.

doi:10.1073/pnas.1222159110

Cited by 122 publications

(128 citation statements)

References 34 publications

Supporting

Mentioning

128

Contrasting

Order By: Relevance

“…Our NPST data comprise ∼1 × 10 6 unique environmental sequences that were amplified from soil metagenomes using degenerate primers targeting two of the most common biosynthetic motifs: nonribosomal peptide synthetase (NRPS) adenylation (A) domains and polyketide synthase (PKS) ketosynthase (KS) domains (12). By targeting these very common biosynthetic domains, sequencing resources are focused on generating only data that are relevant to our search strategy, therefore the raw sequencing power required to generate this dataset was quite modest (∼1.5 Gbps) (8,11). We estimate that the diversity of biosynthetic pathways represented in our NPST dataset is at least 50× larger than the NRPS and PKS pathways contained in all publically available sequenced bacterial genomes, as judged by the number of equivalent domains identified by recent systematic analyses (13)(14)(15).…”

Section: Resultsmentioning

confidence: 99%

“…A convenient feature of the computational framework is the ability to identify overlapping clones that allow reconstruction of complete pathways by targeting multiple library wells containing the same NPST. The strategy of partially arraying libraries and generating barcoded NPSTs from each library well allows efficient storage and automated in silico screening of cloned metagenomes for diverse biomedically relevant BGCs, as well as facile recovery of entire BGCs identified in computational screens of NPST data (8,9).…”

Section: Recovery Sequencing and In Silico Analysis Of Epoxyketonementioning

confidence: 99%

“…Unfortunately, the large DNA contigs that these search strategies require as input are not readily available from complex metagenomes. In response to the need for a more robust metagenomic search strategy, our group recently developed an informatics platform called eSNaPD (8,9) (environmental Surveyor of Natural Product Diversity) with the specific aim of facilitating sequence-guided discovery of new bacterial natural products from complex metagenomes (Fig. 1).…”

mentioning

confidence: 99%

“…NPSTs are used to predict gene content and chemical output of the BGCs present in a metagenome, in a fashion analogous to reconstructing species phylogeny using 16S rRNA sequences (8). Once NPST data are generated from environmental metagenomes or metagenomic libraries, eSNaPD searches each NPST against a curated reference database, and identifies NPSTs whose closest evolutionary relative among all previously characterized reference BGCs encodes a molecule of interest.…”

mentioning

confidence: 99%

“…Once NPST data are generated from environmental metagenomes or metagenomic libraries, eSNaPD searches each NPST against a curated reference database, and identifies NPSTs whose closest evolutionary relative among all previously characterized reference BGCs encodes a molecule of interest. This "closest relative" search approach is computationally inexpensive; however, the output it provides is a robust predictor of pathway gene content and chemical output (8).…”

mentioning

confidence: 99%

See 4 more Smart Citations

Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors

Owen

Charlop–Powers²,

Smith³

et al. 2015

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

105

109

View full text Add to dashboard Cite

In molecular evolutionary analyses, short DNA sequences are used to infer phylogenetic relationships among species. Here we apply this principle to the study of bacterial biosynthesis, enabling the targeted isolation of previously unidentified natural products directly from complex metagenomes. Our approach uses short natural product sequence tags derived from conserved biosynthetic motifs to profile biosynthetic diversity in the environment and then guide the recovery of gene clusters from metagenomic libraries. The methodology is conceptually simple, requires only a small investment in sequencing, and is not computationally demanding. To demonstrate the power of this approach to natural product discovery we conducted a computational search for epoxyketone proteasome inhibitors within 185 globally distributed soil metagenomes. This led to the identification of 99 unique epoxyketone sequence tags, falling into 6 phylogenetically distinct clades. Complete gene clusters associated with nine unique tags were recovered from four saturating soil metagenomic libraries. Using heterologous expression methodologies, seven potent epoxyketone proteasome inhibitors (clarepoxcins A-E and landepoxcins A and B) were produced from these pathways, including compounds with different warhead structures and a naturally occurring halohydrin prodrug. This study provides a template for the targeted expansion of bacterially derived natural products using the global metagenome.T he advent of cost-effective high-throughput sequencing and an increasingly sophisticated understanding of bacterial secondary metabolite biosynthesis have led to two important revelations with respect to the search for new natural products: first, that the biosynthetic potential of most cultured bacteria, as judged by the number of biosynthetic gene clusters (BGCs) observed in sequenced genomes, is far greater than previously estimated (1, 2); second, that the number of bacterial species in most environments is at least 100× greater than the number of species that is readily cultured (3, 4). These observations suggest that conventional "phenotype-first" natural products isolation approaches have only examined a small fraction of earth's bacterial biosynthetic potential.There are now a number of genomic search engines available that allow researchers to rapidly scan microbial whole genome sequences for BGCs encoding new natural products (5-7). Unfortunately, the large DNA contigs that these search strategies require as input are not readily available from complex metagenomes. In response to the need for a more robust metagenomic search strategy, our group recently developed an informatics platform called eSNaPD (8, 9) (environmental Surveyor of Natural Product Diversity) with the specific aim of facilitating sequence-guided discovery of new bacterial natural products from complex metagenomes (Fig. 1).The eSNaPD software is designed to bioinformatically assess short DNA sequences that have been amplified from environmental metagenomes by degenerate PCR targeting c...

show abstract