Computing sequence similarity is a fundamental task in biology, with alignment forming the basis for the annotation of genes and genomes and providing the core data structures for evolutionary analysis. Standard approaches are a mainstay of modern molecular biology and rely on variations of edit distance to obtain explicit alignments between pairs of biological sequences. However, sequence alignment algorithms struggle with remote homology tasks and cannot identify similarities between many pairs of proteins with similar structures and likely homology. Recent work suggests that using machine learning language models can improve remote homology detection. To this end, we introduce DeepBLAST, that obtains explicit alignments from residue embeddings learned from a protein language model integrated into an end-to-end differentiable alignment framework. This approach can be accelerated on the GPU architectures and outperforms conventional sequence alignment techniques in terms of both speed and accuracy when identifying structurally similar proteins.
Microbes produce an array of secondary metabolites that perform diverse functions from communication to defense. These metabolites have been used to benefit human health and sustainability. In their analysis of the Genomes from Earth's Microbiomes (GEM) catalog, Nayfach and co-authors observed that, whereas genes coding for certain classes of secondary metabolites are limited or enriched in certain microbial taxa, "specific chemistry is not limited or amplified by the environment, and that most classes of secondary metabolites can be found nearly anywhere". Although metagenome mining is a powerful way to annotate biosynthetic gene clusters (BCGs), chemical evidence is required to confirm the presence of metabolites and comprehensively address this fundamental hypothesis, as metagenomic data only identify metabolic potential. To describe the Earth's metabolome, we use an integrated omics approach: the direct survey of metabolites associated with microbial communities spanning diverse environments using untargeted metabolomics coupled with metagenome analysis. We show, in contrast to Nayfach and co-authors, that the presence of certain classes of secondary metabolites can be limited or amplified by the environment. Importantly, our data indicate that considering the relative abundances of secondary metabolites (i.e., rather than only presence/absence) strengthens differences in metabolite profiles across environments, and that their richness and composition in any given sample do not directly reflect those of co-occurring microbial communities, but rather vary with the environment.
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are an important class of natural products that include many antibiotics and a variety of other bioactive compounds. While recent breakthroughs in RiPP discovery raised the challenge of developing new algorithms for their analysis, peptidogenomic-based identification of RiPPs by combining genome/metagenome mining with analysis of tandem mass spectra remains an open problem. We present here MetaRiPPquest, a software tool for addressing this challenge that is compatible with large-scale screening platforms for natural product discovery. After searching millions of spectra in the Global Natural Products Social (GNPS) molecular networking infrastructure against just six genomic and metagenomic datasets, MetaRiPPquest identified 27 known and discovered 5 novel RiPP natural products. 1 spectra of natural products. However, to transform natural product discovery into a high-throughput technology and to fully realize the promise of the GNPS project, new algorithms are needed for natural products discovery 6-10 . Indeed, while spectra in the GNPS molecular network represent a gold mine for future chemical discoveries, their interpretation remains a bottleneck due to the large volume of data produced by modern mass spectrometers and unavailability of computational platforms for data processing.The efforts present herein focus on Ribosomally synthesized and Post-translationally modified Peptides (RiPPs), a rapidly expanding group of natural products with applications in pharmaceutical and food industries 11 . RiPPs are produced by RiPP Synthetases (RiPPS) through the Post Ribosomal Peptide Synthesis (PRPS) pathway 11 . RiPPs are initially synthesized as precursor peptides, encoded by RiPP structural genes. The RiPP structural genes are often quite short, making their annotation difficult 12 . A precursor peptide consists of a prefix leader peptide appended to a suffix core peptide. A leader peptide is important for recognition by the RiPP post-translational modification enzymes and for exporting the RiPP out of the cell. The core peptide is post-translationally modified by the RiPP biosynthetic machinery, proteolytically cleaved from the leader peptide to yield the mature RiPP, and exported out of the cell by transporters. The precursor peptide and the enzymes responsible for post-translational modifications (PTMs), proteolytic cleavage, and transportation usually appear in a contiguous biosynthetic gene cluster (BGC) of a RiPP within a microbial genome. The length of the microbial RiPP-encoding BGCs typically varies from 1,000 to 40,000 bp (average length 10,000 bp), larger than the current length of short reads generated by next generation sequencing (350bp), and making DNA assembly a critical part of any short read based RiPP discovery method.Genome mining refers to the informatics-based structural interpretation of a natural product BGC to infer information about the natural product itself. The discoveries of coelichelin in Streptomyces coelicolo...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.