In this study we show that protein language models can encode structural and functional information of GPCR sequences that can be used to predict their signaling and functional repertoire. We used the ESM1b protein embeddings as features and the binding information known from publicly available studies to develop PRECOGx, a machine learning predictor to explore GPCR interactions with G protein and β-arrestin, which we made available through a new webserver (https://precogx.bioinfolab.sns.it/). PRECOGx outperformed its predecessor (e.g. PRECOG) in predicting GPCR-transducer couplings, being also able to consider all GPCR classes. The webserver also provides new functionalities, such as the projection of input sequences on a low-dimensional space describing essential features of the human GPCRome, which is used as a reference to track GPCR variants. Additionally, it allows inspection of the sequence and structural determinants responsible for coupling via the analysis of the most important attention maps used by the models as well as through predicted intramolecular contacts. We demonstrate applications of PRECOGx by predicting the impact of disease variants (ClinVar) and alternative splice forms from healthy tissues (GTEX) of human GPCRs, revealing the power to dissect system biasing mechanisms in both health and disease.
GPCRs are master regulators of cell signaling by transducing extracellular stimuli into the cell via selective coupling to intracellular G-proteins. Here we present a computational analysis of the structural determinants of G-protein-coupling repertoire of experimental and predicted 3D GPCR-G-protein complexes. Interface contact analysis recapitulates structural hallmarks associated with G-protein-coupling specificity, including TM5, TM6 and ICLs. We employ interface contacts as fingerprints to cluster Gs vs Gi complexes in an unsupervised fashion, suggesting that interface residues contribute to selective coupling. We experimentally confirm on a promiscuous receptor (CCKAR) that mutations of some of these specificity-determining positions bias the coupling selectivity. Interestingly, Gs-GPCR complexes have more conserved interfaces, while Gi/o proteins adopt a wider number of alternative docking poses, as assessed via structural alignments of representative 3D complexes. Binding energy calculations demonstrate that distinct structural properties of the complexes are associated to higher stability of Gs than Gi/o complexes. AlphaFold2 predictions of experimental binary complexes confirm several of these structural features and allow us to augment the structural coverage of poorly characterized complexes such as G12/13.
A ranked tree topology is a tree topology with a temporal ordering of its coalescence events. Under the multispecies coalescent model, we consider ranked gene tree topologies realized along the branches of ranked species trees, where one gene copy is sampled for each species. Previous results have demonstrated that for almost all ranked species tree topologies with at least five species, there exists a set of branch lengths such that the maximally probable ranked gene tree topologies-those generated with the highest probability under the model-do not match the species tree ranked topology. Here, we focus on the agreement of a ranked species tree with its maximally probable ranked gene tree topologies in terms of their unranked topology, that is, disregarding the ordering of the coalescence events. We show that although the set of maximally probable ranked gene tree topologies for a ranked species tree can contain ranked trees with different unranked topologies, at least one of these maximal ranked gene tree topologies must have the same unranked topology as the species tree. Our results contribute to the study of the relationships between gene trees and species trees.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.