Antibiotic efficacy can be antagonized by bioactive metabolites and other drugs present at infection sites. Pseudomonas aeruginosa, a common cause of biofilm-based infections, releases metabolites called phenazines that accept electrons to support cellular redox balancing. Here, we find that phenazines promote tolerance to clinically relevant antibiotics, such as ciprofloxacin, in P. aeruginosa biofilms and that this effect depends on the carbon source provided for growth. We couple stable isotope labeling with stimulated Raman scattering microscopy to visualize biofilm metabolic activity in situ. This approach shows that phenazines promote metabolism in microaerobic biofilm regions and influence metabolic responses to ciprofloxacin treatment. Consistent with roles of specific respiratory complexes in supporting phenazine utilization in biofilms, phenazine-dependent survival on ciprofloxacin is diminished in mutants lacking these enzymes. Our work introduces a technique for the chemical imaging of biosynthetic activity in biofilms and highlights complex interactions between bacterial products, their effects on biofilm metabolism, and the antibiotics we use to treat infections.
The number of unannotated or orphan enzymes vastly outnumber those for which the chemical structure of the substrates are known. While a number of enzyme function prediction algorithms exist, these often predict Enzyme Commission (EC) numbers or enzyme family, which limits their ability to generate experimentally testable hypotheses. Here, we harness protein language models, cheminformatics, and machine learning classification techniques to accelerate the annotation of orphan enzymes by predicting their substrate's chemical structural class. We use the orphan enzymes of Mycobacterium tuberculosis as a case study, focusing on two protein families that are highly abundant in its proteome: the short-chain dehydrogenase/reductases (SDRs) and the S-adenosylmethionine (SAM)-dependent methyltransferases. Training machine learning classification models that take as input the protein sequence embeddings obtained from a pre-trained, self-supervised protein language model results in excellent accuracy for a wide variety of prediction tasks. These include redox cofactor preference for SDRs; small-molecule vs. polymer (i.e. protein, DNA or RNA) substrate preference for SAM-dependent methyltransferases; as well as more detailed chemical structural predictions for the preferred substrates of both enzyme families. We then use these trained classifiers to generate predictions for the full set of unannotated SDRs and SAM-methyltransferases in the proteomes of M. tuberculosis and other mycobacteria, generating a set of biochemically testable hypotheses. Our approach can be extended and generalized to other enzyme families and organisms, and we envision it will help accelerate the annotation of a large number of orphan enzymes.
Genome-wide random mutagenesis screens using transposon sequencing (TnSeq) have been a cornerstone of functional genetics in Mycobacterium tuberculosis (Mtb) for decades. TnSeq has been used to identify essential genes in a wide diversity of experimental conditions. We recently compiled a large number of standardized TnSeq screens, opening up the possibility of systematically searching for pairwise correlations in gene conditional essentiality profiles - known as co-essentiality or co-fitness analysis - to reveal clusters of genes with similar function. Here we harness the Mtb TnSeq database and a recent statistical method to detect co-fitness correlations to search for significant co-essentiality signals across the Mtb genome. We find strong co-essential clusters containing both well-established and novel protein complexes and metabolic modules. We describe selected functional modules identified by our GLS pipeline, review the literature supporting their associations, and propose hypotheses about novel associations. We then focus on a strongly correlated cluster of seven enzymes for downstream experimental validation, characterizing it as an enzymatic arsenal that helps Mtb counter the toxic effects of itaconate, one of the host's key antibacterial compounds. Finally, using AlphaFold2 structure predictions and a protein complex quality scoring function, we design a virtual screen to detect and rank potential interacting heterodimers from protein pairs with strong TnSeq profile correlations. As co-essentiality analysis of the Mtb genome could help accelerate gene functional discovery in this important human pathogen, we share our repository of correlation signals with the research community.
Visualizing relationships and similarities between proteins can reveal insightful biology. Current approaches to visualize and analyze proteins based on sequence homology, such as sequence similarity networks (SSNs), create representations of BLAST-based pairwise comparisons. These approaches could benefit from incorporating recent protein language models, which generate high-dimensional vector representations of protein sequences from self-supervised learning on hundreds of millions of proteins. Inspired by SSNs, we developed an interactive tool - Protein Language UMAPs (PLUMAPs) - to visualize protein similarity with protein language models, dimensionality reduction, and topic modeling. As a case study, we compare our tool to Sequence Similarity Network (SSN) using the proteomes of two related bacterial species, Mycobacterium tuberculosis and Mycobacterium smegmatis. Both SSNs and PLUMAPs generate protein clusters corresponding to protein families and highlight enrichment or depletion across species. However, only in PLUMAPs does the layout distance between proteins and protein clusters meaningfully reflect similarity. Thus in PLUMAPs, related protein families are displayed as nearby clusters, and larger-scale structures correlate with cellular localization. Finally, we adapt techniques from topic modeling to automatically annotate protein clusters, making them more easily interpretable and potentially insightful. We envision that as large protein language models permeate bioinformatics and interactive sequence analysis tools, PLUMAPs will become a useful visualization resource across a wide variety of biological disciplines. Anticipating this, we provide a prototype for an online, open source version of PLUMAPs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.