An important goal in molecular biology is to understand functional changes upon single-point mutations in proteins. Doing so through a detailed characterization of structure spaces and underlying energy landscapes is desirable but continues to challenge methods based on Molecular Dynamics. In this paper we propose a novel algorithm, SIfTER, which is based instead on stochastic optimization to circumvent the computational challenge of exploring the breadth of a protein’s structure space. SIfTER is a data-driven evolutionary algorithm, leveraging experimentally-available structures of wildtype and variant sequences of a protein to define a reduced search space from where to efficiently draw samples corresponding to novel structures not directly observed in the wet laboratory. The main advantage of SIfTER is its ability to rapidly generate conformational ensembles, thus allowing mapping and juxtaposing landscapes of variant sequences and relating observed differences to functional changes. We apply SIfTER to variant sequences of the H-Ras catalytic domain, due to the prominent role of the Ras protein in signaling pathways that control cell proliferation, its well-studied conformational switching, and abundance of documented mutations in several human tumors. Many Ras mutations are oncogenic, but detailed energy landscapes have not been reported until now. Analysis of SIfTER-computed energy landscapes for the wildtype and two oncogenic variants, G12V and Q61L, suggests that these mutations cause constitutive activation through two different mechanisms. G12V directly affects binding specificity while leaving the energy landscape largely unchanged, whereas Q61L has pronounced, starker effects on the landscape. An implementation of SIfTER is made available at http://www.cs.gmu.edu/~ashehu/?q=OurTools. We believe SIfTER is useful to the community to answer the question of how sequence mutations affect the function of a protein, when there is an abundance of experimental structures that can be exploited to reconstruct an energy landscape that would be computationally impractical to do via Molecular Dynamics.
Evidence is emerging that many proteins involved in proteinopathies are dynamic molecules switching between stable and semistable structures to modulate their function. A detailed understanding of the relationship between structure and function in such molecules demands a comprehensive characterization of their conformation space. Currently, only stochastic optimization methods are capable of exploring conformation spaces to obtain sample-based representations of associated energy surfaces. These methods have to address the fundamental but challenging issue of balancing computational resources between exploration (obtaining a broad view of the space) and exploitation (going deep in the energy surface). We propose a novel algorithm that strikes an effective balance by employing concepts from evolutionary computation. The algorithm leverages deposited crystal structures of wildtype and variant sequences of a protein to define a reduced, low-dimensional search space from where to rapidly draw samples. A multiscale technique maps samples to local minima of the all-atom energy surface of a protein under investigation. Several novel algorithmic strategies are employed to avoid premature convergence to particular minima and obtain a broad view of a possibly multibasin energy surface. Analysis of applications on different proteins demonstrates the broad utility of the algorithm to map multibasin energy landscapes and advance modeling of multibasin proteins. In particular, applications on wildtype and variant sequences of proteins involved in proteinopathies demonstrate that the algorithm makes an important first step toward understanding the impact of sequence mutations on misfunction by providing the energy landscape as the intermediate explanatory link between protein sequence and function.
The emerging picture of proteins as dynamic systems switching between structures to modulate function demands a comprehensive structural characterization only possible through an energy landscape treatment. Only sample-based representations of a protein energy landscape are viable in silico, and sampling-based exploration algorithms have to address the fundamental but challenging issue of balancing between exploration (broad view) and exploitation (going deep). We propose here a novel algorithm that achieves this balance by combining concepts from evolutionary computation and protein modeling research. The algorithm draws samples from a reduced space obtained via principal component analysis of known experimental structures. Samples are lifted from the reduced to an all-atom structure space where they are then mapped to nearby local minima in the all-atom energy landscape. From an algorithmic point of view, this paper makes several contributions, including the design of a local selection operator that is crucial to avoiding premature convergence. From an application point of view, this paper demonstrates the utility of the proposed evolutionary algorithm to advance understanding of multi-basin proteins. In particular, the proposed algorithm makes the first steps to answering the question of how sequence mutations affect function in proteins at the center of proteinopathies by providing the energy landscape as the intermediate explanatory link between protein sequence and function.
The Ras enzyme mediates critical signaling pathways in cell proliferation and development by transitioning between GTP-(active) and GDP-bound (inactive) states. Many cancers are linked to specific Ras mutations affecting its conformational switching between active and inactive states. A detailed understanding of the sequence-structure-function space in Ras is missing. In this paper, we provide the first steps towards such an understanding. We conduct a detailed analysis of X-ray structures of wildtype and mutant variants of Ras. We embed the structures onto a low-dimensional structure space by means of Principal Component Analysis (PCA) and show that these structures are energetically feasible for wildtype Ras. We then propose a probabilistic conformational search algorithm to further populate the structure space of wildtype Ras. The algorithm explores a low-dimensional map as guided by the principal components obtained through PCA. Generated conformations are rebuilt in all-atom detail and energetically refined through Rosetta in order to further populate the structure space of wildtype Ras with energetically-feasible structures. Results show that a variety of novel structures are revealed, some of which reproduce experimental structures not subjected to the PCA but withheld for the purpose of validation. This work is a first step towards a comprehensive characterization of the sequence-structure space in Ras, which promises to reveal novel structures not probed in the wet laboratory, suggest new mutations, propose new binding sites, and even elucidate unknown interacting partners of Ras.
The focus on important diseases of our time has prompted many experimental labs to resolve and deposit functional structures of disease-causing or disease-participating proteins. At this point, many functional structures of wildtype and disease-involved variants of a protein exist in structural databases. The objective for computational approaches is to employ such information to discover features of the underlying energy landscape on which functional structures reside. Important questions about which subset of structures are most thermodynamically-stable remain unanswered. The challenge is how to transform an essentially discrete problem into one where continuous optimization is suitable and effective. In this paper, we present such a transformation, which allows adapting and applying evolution strategies to explore an underlying continuous variable space and locate the global optimum of a multimodal fitness landscape. The paper presents results on wildtype and mutant sequences of proteins implicated in human disorders, such as cancer and Amyotrophic lateral sclerosis. More generally, the paper offers a methodology for transforming a discrete problem into a continuous optimization one as a way to possibly address outstanding discrete problems in the evolutionary computation community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.