Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.
It is widely believed that the interactions of proteins with ligands and other proteins are determined by their dynamic characteristics as opposed to only static, time-invariant processes. We propose a novel computational technique, called ProteinAC (PAC), that can be used to analyze small scale functional protein motions as well as interactions with ligands directly in the frequency domain. PAC was inspired by a frequency domain analysis technique that is widely used in electronic circuit design, and can be applied to both coarse-grained and all-atom models. It can be considered as a generalization of previously proposed static perturbation-response methods, where the frequency of the perturbation becomes the key. We discuss the precise relationship of PAC to static perturbation-response schemes. We show that the frequency of the perturbation may be an important factor in protein dynamics. Perturbations at different frequencies may result in completely different response behavior while magnitude and direction are kept constant. Furthermore, we introduce several novel frequency dependent metrics that can be computed via PAC in order to characterize response behavior. We present results for the ferric binding protein that demonstrate the potential utility of the proposed techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.