Simulating protein folding has been a challenging problem for decades due to the long timescales involved (compared with what is possible to simulate) and the challenges of gaining insight from the complex nature of the resulting simulation data. Markov State Models (MSMs) present a means to tackle both of these challenges, yielding simulations on experimentally relevant timescales, statistical significance, and coarse grained representations that are readily humanly understandable. Here, we review this method with the intended audience of non-experts, in order to introduce the method to a broader audience. We review the motivations, methods, and caveats of MSMs, as well as some recent highlights of applications of the method. We conclude by discussing how this approach is part of a paradigm shift in how one uses simulations, away from anecdotal single-trajectory approaches to a more comprehensive statistical approach.
Markov State Models provide a framework for understanding the fundamental states and rates in the conformational dynamics of biomolecules. We describe an improved protocol for constructing Markov State Models from molecular dynamics simulations. The new protocol includes advances in clustering, data preparation, and model estimation; these improvements lead to significant increases in model accuracy, as assessed by the ability to recapitulate equilibrium and kinetic properties of reference systems. A high-performance implementation of this protocol, provided in MSMBuilder2, is validated on dynamics ranging from picoseconds to milliseconds.
Markov state models (MSMs) are a powerful tool for modeling both the thermodynamics and kinetics of molecular systems. In addition, they provide a rigorous means to combine information from multiple sources into a single model and to direct future simulations/experiments to minimize uncertainties in the model. However, constructing MSMs is challenging because doing so requires decomposing the extremely high dimensional and rugged free energy landscape of a molecular system into long-lived states, also called metastable states. Thus, their application has generally required significant chemical intuition and hand-tuning. To address this limitation we have developed a toolkit for automating the construction of MSMs called MSMBUILDER (available at https://simtk.org/home/msmbuilder). In this work we demonstrate the application of MSMBUILDER to the villin headpiece (HP-35 NleNle), one of the smallest and fastest folding proteins. We show that the resulting MSM captures both the thermodynamics and kinetics of the original molecular dynamics of the system. As a first step toward experimental validation of our methodology we show that our model provides accurate structure prediction and that the longest timescale events correspond to folding.
A complete understanding of how proteins fold, i.e. self-assemble to their biologically relevant "native state," remains an unattained goal 1 . Computer simulation, validated by experiment, is a natural means to elucidate this. There is over a million-fold range in folding rates, suggesting a possible diversity in mechanisms between slow and fast folding proteins 2 . Very fast (microsecond timescale) folding proteins 3,4 appear to fold via a large number of heterogeneous, parallel paths [5][6][7] , potentially key for folding on such fast timescales. Does the folding of much slower proteins change this picture?To date, the slowest-folding proteins folded ab initio by all-atom molecular dynamics simulations with fidelity to experimental kinetics have had folding times in the range of nanoseconds to microseconds. These include the designed mini-protein Trp-cage (~4.1 μs) 8 , the villin headpiece domain (~10 μs) 9 , a fast-folding variant of villin (<1 μs) 7 , and Fip35 WW domain (~13 μs) 10 . In this communication, we report simulations of several folding trajectories, each from fully unfolded states, of the 39-residue protein NTL9(1-39), which experimentally has a folding time of ~1.5 milliseconds 11 .
MD simulationTrajectories were simulated via the Folding@Home distributed computing platform 12 at 300K, 330K, 370K and 450K from native, extended, and random-coil configurations using an accelerated version of GROMACS written for GPU processors 13 , for an aggregate time of 1.52 ms. GPUs play a key role here, allowing for dramatically longer trajectories than previously possible. The AMBER ff96 forcefield 14 with the GBSA solvation model 15 was used, a combination previously shown to give good results folding Fip35 WW domain 10 , and shown to exhibit a good balance of native-like secondary structure for a set of small helical and beta sheet peptides studied by replica exchange 17 .
Prediction of native structure and folding ratesWe find that the native state (taken from the N-terminal domain of the crystal structure of ribosomal protein L9 18 ) is stable in this forcefield at 300K, exhibiting decreasing stability with increasing temperature (Figure 1a). RMSD-C α distributions after 10 μs show well-defined native and collapsed unfolded basins near 3Å and 5Å, respectively. Of the ~3000 trajectories started from unfolded (extended and coil) states at 370K (Figure 1b) The observed number of folding events n is consistent with expectations from a simple model of parallel uncoupled folding simulations 19 in which folding is modeled as a two-state Poisson process:
Simulations can provide tremendous insight into atomistic details of biological mechanisms, but micro- to milliseconds timescales are historically only accessible on dedicated supercomputers. We demonstrate that cloud computing is a viable alternative, bringing long-timescale processes within reach of a broader community. We used Google's Exacycle cloud computing platform to simulate 2 milliseconds of dynamics of the β2 adrenergic receptor — a major drug target G protein-coupled receptor (GPCR). Markov state models aggregate independent simulations into a single statistical model that is validated by previous computational and experimental results. Moreover, our models provide an atomistic description of the activation of a GPCR, revealing multiple activation pathways. Agonists and inverse agonists interact differentially with these pathways, with profound implications for drug design
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.