Recognition of short linear motifs (SLiMs) or peptides by proteins is an important component of many cellular processes. However, due to limited and degenerate binding motifs, prediction of cellular targets is challenging. In addition, many of these interactions are transient and of relatively low affinity. Here, we focus on one of the largest families of SLiM-binding domains in the human proteome, the PDZ domain. These domains bind the extreme C-terminus of target proteins, and are involved in many signaling and trafficking pathways. To predict endogenous targets of PDZ domains, we developed MotifAnalyzer-PDZ, a program that filters and compares all motif-satisfying sequences in any publicly available proteome. This approach enables us to determine possible PDZ binding targets in humans and other organisms. Using this program, we predicted and biochemically tested novel human PDZ targets by looking for strong sequence conservation in evolution. We also identified three C-terminal sequences in choanoflagellates that bind a choanoflagellate PDZ domain, the Monsiga brevicollis SHANK1 PDZ domain (mbSHANK1), with endogenously-relevant affinities, despite a lack of conservation with the targets of a homologous human PDZ domain, SHANK1. All three are predicted to be signaling proteins, with strong sequence homology to cytosolic and receptor tyrosine kinases. Finally, we analyzed and compared the positional amino acid enrichments in PDZ motifsatisfying sequences from over a dozen organisms. Overall, MotifAnalyzer-PDZ is a versatile program to investigate potential PDZ interactions. This proof-ofconcept work is poised to enable similar types of analyses for other SLiM-binding domains (e.g., MotifAnalyzer-Kinase). MotifAnalyzer-PDZ is available at
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Gram-positive bacteria are some of the earliest known life forms, diverging from gram-negative bacteria 2 billion years ago. These organisms utilize sortase enzymes to attach proteins to their peptidoglycan cell wall, a structural feature that distinguishes the two types of bacteria. The transpeptidase activity of sortases make them an important tool in protein engineering applications, e.g., in sortase-mediated ligations or sortagging. However, due to relatively low catalytic efficiency, there are ongoing efforts to create better sortase variants for these uses. Here, we use bioinformatics tools, principal component analysis and ancestral sequence reconstruction, in combination with protein biochemistry, to analyze natural sequence variation in these enzymes. Principal component analysis on the sortase superfamily distinguishes previously described classes and identifies regions of relatively high sequence variation in structurally-conserved loops within each sortase family, including those near the active site. Using ancestral sequence reconstruction, we determined sequences of ancestral Staphylococcus and Streptococcus Class A sortase proteins. Enzyme assays revealed that the ancestral Streptococcus enzyme is relatively active and shares similar sequence variation with other Class A Streptococcus sortases. Taken together, we highlight how natural sequence variation can be utilized to investigate this important protein family, arguing that these and similar techniques may be used to discover or design sortases with increased catalytic efficiency and/or selectivity for sortase-mediated ligation experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.