2020
DOI: 10.1101/2020.05.19.104752
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Integration of machine learning and pan-genomics expands the biosynthetic landscape of RiPP natural products

Abstract: 12Most clinical drugs are based on microbial natural products, with compound classes including 13 polyketides (PKS), non-ribosomal peptides (NRPS), fluoroquinones and ribosomally synthesized and 14 post-translationally modified peptides (RiPPs). While variants of biosynthetic gene clusters (BGCs) for 15 known classes of natural products are easy to identify in genome sequences, BGCs for new 16 compound classes escape attention. In particular, evidence is accumulating that for RiPPs, subclasses 17 known thus fa… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
17
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(18 citation statements)
references
References 75 publications
1
17
0
Order By: Relevance
“…The variety of tailoring enzymes and precursor peptide sequences indicates that the products will be highly diverse. This is supported by the parallel identification of HopA1-containing BGCs by the decRiPPter algorithm 53 , which have been recently defined as lanthidins in antiSMASH 5.0 61 .…”
Section: Resultsmentioning
confidence: 96%
See 1 more Smart Citation
“…The variety of tailoring enzymes and precursor peptide sequences indicates that the products will be highly diverse. This is supported by the parallel identification of HopA1-containing BGCs by the decRiPPter algorithm 53 , which have been recently defined as lanthidins in antiSMASH 5.0 61 .…”
Section: Resultsmentioning
confidence: 96%
“…The precursor peptides in this network share higher conservation on their likely leader N-terminal region than in their C-terminal region ( Figure S27), although the C-terminus contains a Ser/Ala rich region and two highly conserved Thr and Cys residues ( Figure 8A), which supports the theory that these BGCs produce diverse AviMeCys containing RiPPs. In parallel with our study, a new RiPP genome mining algorithm, decRiPPter, also identifies the discovery of a similar set of actinobacterial RiPP BGCs encoding HopA1-like proteins and phosphotransferases 53 . Figure S26).…”
Section: Genome Mining Reveals That the Hopa1 And Phosphotransferasementioning
confidence: 83%
“…283 In a departure from sequence-based methods, decRiPPter (Data-driven Exploratory Class-independent RiPP TrackER) was developed for the explicit purpose of detecting new RiPP classes without relying on homology to known RiPP classes or enzymatic machinery. 284 The core ltering step of the decRiPPter algorithm uses pan-genomic comparisons to detect operons that are sparsely distributed within taxonomic groups and thus are likely involved in secondary rather than primary metabolic functions. Kloosterman et al analyzed 1295 Streptomyces genomes with decRiPPter to identify a new family of RiPP maturases catalyzing dehydration and cyclization reactions for a new lanthipeptide class of natural products.…”
Section: Sequence-independent Methodsmentioning
confidence: 99%
“…22 With regards to this most wanted list, it is interesting to note that biosynthetic enzymes oen have a more discontinuous taxonomic distribution than primarily metabolic enzymes. 25,26 Therefore the remaining 111 312 protein domains not on the list with a sparser taxonomic distribution may actually be of greater interest for the natural products community. Regarding de novo discovery of enzymes with new structural folds, the Baker lab recently used metagenomic sequences to model more than 614 protein families with unknown structures, 137 of which have completely new protein folds.…”
Section: Denitions For Enzyme Discoverymentioning
confidence: 99%
“…93 Within the natural product sciences, the potential applications are extensive and include recognition of genomic signature elements and predictions about collective outcomes biosynthetically, projections of bioactivity, propositions for (bio)synthetic compound diversity, and disease targeting. 248 Deep learning (DL) is an extension of ML and focuses on layers of neural networks that can assist in predicting protein structures. 249,250 DL has provided next-stage analysis of quantitative structure activity relationship (QSAR) data for mutagenesis, 251 for diverse biological activities using global libraries, 252 and identified the compound halicin as a structurally new, and mechanistically different antibiotic, along with eight additional candidate compounds.…”
Section: Machine Learningmentioning
confidence: 99%