Motivation After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here we address the performance of AlphaFold2 predictions obtained through ColabFold under this ensemble paradigm. Results Using a curated collection of apo-holo pairs of conformers, we found that AlphaFold2 predicts the holo form of a protein in ∼70% of the cases, being unable to reproduce the observed conformational diversity with the same error for both conformers. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairment is related to the heterogeneity in the degree of conformational diversity found between different members of the homologous family of the protein under study. Finally, we found that main-chain flexibility associated with apo-holo pairs of conformers negatively correlates with the predicted local model quality score plDDT, indicating that plDDT values in a single 3D model could be used to infer local conformational changes linked to ligand binding transitions. Availability Data and code used in this manuscript are publicly available at https://gitlab.com/sbgunq/publications/af2confdiv-oct2021 Supplementary Information Supplementary data is available at the journal's web site.
Protein motions are a key feature to understand biological function. Recently, a large-scale analysis of protein conformational diversity showed a positively skewed distribution with a peak at 0.5 Å C-alpha root-mean-square-deviation (RMSD). To understand this distribution in terms of structure-function relationships, we studied a well curated and large dataset of ~5,000 proteins with experimentally determined conformational diversity. We searched for global behaviour patterns studying how structure-based features change among the available conformer population for each protein. This procedure allowed us to describe the RMSD distribution in terms of three main protein classes sharing given properties. The largest of these protein subsets (~60%), which we call “rigid” (average RMSD = 0.83 Å), has no disordered regions, shows low conformational diversity, the largest tunnels and smaller and buried cavities. The two additional subsets contain disordered regions, but with differential sequence composition and behaviour. Partially disordered proteins have on average 67% of their conformers with disordered regions, average RMSD = 1.1 Å, the highest number of hinges and the longest disordered regions. In contrast, malleable proteins have on average only 25% of disordered conformers and average RMSD = 1.3 Å, flexible cavities affected in size by the presence of disordered regions and show the highest diversity of cognate ligands. Proteins in each set are mostly non-homologous to each other, share no given fold class, nor functional similarity but do share features derived from their conformer population. These shared features could represent conformational mechanisms related with biological functions.
Protein-protein interactions are essential to all aspects of life. Specific interactions result from evolutionary pressure at the interacting interfaces of partner proteins. However, evolutionary pressure is not homogeneous within the interface: for instance, each residue does not contribute equally to the binding energy of the complex. To understand functional differences between residues within the interface, we analyzed their properties in the core and rim regions. Here, we characterized protein interfaces with two evolutionary measures, conservation and coevolution, using a comprehensive dataset of 896 protein complexes. These scores can detect different selection pressures at a given position in a multiple sequence alignment. We also analyzed how the number of interactions in which a residue is involved influences those evolutionary signals. We found that the coevolutionary signal is higher in the interface core than in the interface rim region. Additionally, the difference in coevolution between core and rim regions is comparable to the known difference in conservation between those regions. Considering proteins with multiple interactions, we found that conservation and coevolution increase with the number of different interfaces in which a residue is involved, suggesting that more constraints (i.e., a residue that must satisfy a greater number of interactions) allow fewer sequence changes at those positions, resulting in higher conservation and coevolution values. These findings shed light on the evolution of protein interfaces and provide information useful for identifying protein interfaces and predicting protein-protein interactions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.