It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are more conserved than are other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific rates, and novel mechanistic, biophysical models have been proposed to explain the observed correlations. Nonetheless, at best, current models explain approximately 60% of the observed variance, highlighting the limitations of current methods and models, and the need for new research directions.
The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.
Three genes from Arabidopsis thaliana with high sequence similarity to gamma carbonic anhydrase (gammaCA), a Zn containing enzyme from Methanosarcina thermophila (CAM), were identified and characterized. Evolutionary and structural analyses predict that these genes code for active forms of gammaCA. Phylogenetic analyses reveal that these Arabidopsis gene products cluster together with CAM and related sequences from alpha and gamma proteobacteria, organisms proposed as the mitochondrial endosymbiont ancestor. Indeed, in vitro and in vivo experiments indicate that these gene products are transported into the mitochondria as occurs with several mitochondrial protein genes transferred, during evolution, from the endosymbiotic bacteria to the host genome. Moreover, putative CAM orthologous genes are detected in other plants and green algae and were predicted to be imported to mitochondria. Structural modeling and sequence analysis performed in more than a hundred homologous sequences show a high conservation of functionally important active site residues. Thus, the three histidine residues involved in Zn coordination (His 81, 117 and 122), Arg 59, Asp 61, Gin 75, and Asp 76 of CAM are conserved and properly arranged in the active site cavity of the models. Two other functionally important residues (Glu 62 and Glu 84 of CAM) are lacking, but alternative amino acids that might serve to their roles are postulated. Accordingly, we propose that photosynthetic eukaryotic organisms (green algae and plants) contain gammaCAs and that these enzymes codified by nuclear genes are imported into mitochondria to accomplish their biological function.
Protein sequences evolve under selection pressures imposed by functional and biophysical requirements, resulting in site-dependent rates of amino acid substitution. Relative solvent accessibility (RSA) and local packing density (LPD) have emerged as the best candidates to quantify structural constraint. Recent research assumes that RSA is the main determinant of sequence divergence. However, it is not yet clear which is the best predictor of substitution rates. To address this issue, we compared RSA and LPD with site-specific rates of evolution for a diverse data set of enzymes. In contrast with recent studies, we found that LPD measures correlate better than RSA with evolutionary rate. Moreover, the independent contribution of RSA is minor. Taking into account that LPD is related to backbone flexibility, we put forward the possibility that the rate of evolution of a site is determined by the ease with which the backbone deforms to accommodate mutations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.