BackgroundWe examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes.ResultsWe use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 μs aggregate sampling), villin head piece (single trajectory of 125 μs) and β- β- α (BBA) protein (223 + 102 μs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features.ConclusionsTogether, we show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.
The emergence and rapid worldwide spread of the novel coronavirus disease, COVID-19, has prompted concerted efforts to find successful treatments. The causative virus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), uses its spike (S) protein to gain entry into host cells. Therefore, the S protein presents a viable target to develop a directed therapy. Here, we deployed an integrated artificial intelligence with molecular dynamics simulation approach to provide new details of the S protein structure. Based on a comprehensive structural analysis of S proteins from SARS-CoV-2 and previous human coronaviruses, we found that the protomer state of S proteins is structurally flexible. Without the presence of a stabilizing beta sheet from another protomer chain, two regions in the S2 domain and the hinge connecting the S1 and S2 subunits lose their secondary structures. Interestingly, the region in the S2 domain was previously identified as an immunodominant site in the SARS-CoV-1 S protein. We anticipate that the molecular details elucidated here will assist in effective therapeutic development for COVID-19.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.