Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this non-linear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However on longer time scales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins and more rapid structure evolution of proteins with lower packing density. ! IntroductionA wide variety of protein structures exist in nature, however the evolutionary origins of this panoply of proteins remain unknown. While protein sequence evolution is easily traced in nature and produced in the laboratory, the emergence of new protein structures is rarely observed and difficult to engineer (1-3). One approach to studying structure evolution is to examine how proteins' structural similarity varies over a range of sequence identities. Such investigations proceed by aligning many pairs of proteins so that their sequence identity (or another measure of sequence similarity) and structural similarity can be assessed (4-8). The result is a cusped relationship between sequence and structure divergence: sequences reliably diverge up to 70% without significant protein structure evolution. Below 30% sequence identity, the structural similarity between proteins abruptly decreases, giving rise to a "twilight zone" where little can be said about the relationship between sequence identity and structural similarity without more advanced methods. This finding is the foundation of one of the most important methods in protein biophysics: structure homology modeling (9, 10). Despite the fact that the plateau of high structural similarity above 30% sequence identity has been crucial for homology modeling and that many of the advanced structure prediction methods have been motivated by abrupt onset of the twilight zone, the cusped relationship between sequence and structural similarity has not yet received a detailed biophysical justification (11, 12).Previous work characterized the relationship between sequence and structure similarity by fitting the data empirically with an exponential function, and the adequacy of this model was interpreted as evidence in favor of the local model of protein structure determination, namely, that only a key subset of residues encode a protein's struct...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.