Free energy landscapes provide insights into conformational ensembles of biomolecules.In order to analyze these landscapes and elucidate mechanisms underlying conformational changes, there is a need to extract metastable states with limited noise. This has remained a formidable task, despite a plethora of existing clustering methods.We present InfleCS, a novel method for extracting well-defined core states from free energy landscapes. The method is based on a Gaussian mixture free energy estimator and exploits the shape of the estimated density landscape. The core states that naturally arise from the clustering allow for detailed characterization of the conformational ensemble. The clustering quality is evaluated on three toy models with different properties, where the method is shown to consistently outperform other conventional and state-of-the-art clustering methods. Finally, the method is applied to a temperature enhanced molecular dynamics simulation of Ca 2+ -bound Calmodulin. Through the free energy landscape, we discover a pathway between a canonical and a compact state, revealing conformational changes driven by electrostatic interactions.
arXiv:1905.03110v2 [physics.bio-ph] 30 Oct 2019Gaussian mixture models provide accurate estimates of free energy landscapes 1 . Determining metastable core states within a protein's free energy landscape is key to obtaining important biological insights. However, extracting such states from molecular dynamics (MD) simulations with conventional clustering methods is far from straightforward.First of all, we are interested in the metastable configurations at free energy minima, the so-called core states. Since proteins move continuously as they explore free energy landscapes, it is difficult to assess an exact state boundary. Moreover, configurations on transition pathways between metastable states generally contribute to noise when characterizing these states. On top of this, the original data is high dimensional, and the necessary dimensionality reduction results in poorly separated states. Finally, the number of metastable core states is typically not known a priori. Thus, to robustly characterize states without any knowledge of the conformational ensemble, we need a clustering method that is solely based on the data.Many popular clustering methods are based on simple geometric criteria 2-5 . K-means and agglomerative-Ward, for example, attempt to minimize the within-cluster variance. They work very well on datasets with well-separated spherical clusters, but fail when these assumptions are not met. Spectral clustering 6 , on the other hand, can accurately assign labels to nonconvex clusters by performing spectral embedding prior to K-means clustering. The spectral embedding involves learning the data manifold using local neighborhoods around data points.In general, geometric clustering methods assign labels to all points and may not accurately identify the boundary between states at the free energy barrier, which leads to noisy state definitions. An idea is to use th...