Variational Autoencoders for Protein Structure Prediction

Alam, Fardina Fathmiul; Shehu, Amarda

doi:10.1145/3388440.3412471

Cited by 10 publications

(9 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several previous studies ( Alam et al, 2020 ; Alam and Shehu, 2020 ; Guo et al, 2020 ) focused on the evaluation of autoencoders on the generation of nonlinear featurization and the learned nonlinear representations of protein tertiary structures. In the current study, a similar strategy was employed to quantify and compare the performance of autoencoders and variational autoencoders.…”

Section: Methodsmentioning

confidence: 99%

“…Moreover, to evaluate the quality of deep learning models, two distance-based metrics, maximum mean discrepancy and earth mover’s distance, were applied to compare the training and generated distributions. Following the strategy from a previous study ( Alam and Shehu, 2020 ), RMSDs were calculated as a proxy variable representing the protein tertiary structures. 1) Maximum mean discrepancy (MMD).…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Explore Protein Conformational Space With Variational Autoencoder

Tian

Jiang

Trozzi

et al. 2021

Front. Mol. Biosci.

View full text Add to dashboard Cite

Molecular dynamics (MD) simulations have been actively used in the study of protein structure and function. However, extensive sampling in the protein conformational space requires large computational resources and takes a prohibitive amount of time. In this study, we demonstrated that variational autoencoders (VAEs), a type of deep learning model, can be employed to explore the conformational space of a protein through MD simulations. VAEs are shown to be superior to autoencoders (AEs) through a benchmark study, with low deviation between the training and decoded conformations. Moreover, we show that the learned latent space in the VAE can be used to generate unsampled protein conformations. Additional simulations starting from these generated conformations accelerated the sampling process and explored hidden spaces in the conformational landscape.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Explore Protein Conformational Space With Variational Autoencoder

Tian

Jiang

Trozzi

et al. 2021

Front. Mol. Biosci.

View full text Add to dashboard Cite

show abstract

“…While work on this is largely beginning, various generative deep models can be found in literature [ 17 ]. They aim to learn directly from tertiary structures typically represented as contact maps or distance matrices through primarily variational autoencoders (VAEs) [ 18 , 19 ] or generative adversarial networks (GANs) [ 20 , 21 , 22 ]. Until recently [ 23 ], the majority of these models were limited to learning from same-length protein fragments.…”

Section: Introductionmentioning

confidence: 99%

Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures

Alam

Shehu

2022

Biomolecules

Self Cite

View full text Add to dashboard Cite

With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules.

show abstract

“…We note that more progress has been made recently with Variational Autoencoders (VAEs), which provide a generative framework complementary to GANs. We point here two representative works in this area [ 19 , 20 ]. However, these works train a VAE on structures generated for a specific protein molecule, and these structures are obtained from computational platforms, such as MD simulations [ 19 ] or protein structure prediction platforms, such as Rosetta [ 20 ].…”

Section: Introductionmentioning

confidence: 99%

“…We point here two representative works in this area [ 19 , 20 ]. However, these works train a VAE on structures generated for a specific protein molecule, and these structures are obtained from computational platforms, such as MD simulations [ 19 ] or protein structure prediction platforms, such as Rosetta [ 20 ]. None of these works leverage known experimental structures in the PDB, which has been the trend in the nascent sub-area of GANs for protein structure modeling as a way of learning from the actual ground truth distribution rather than other computational frameworks.…”

Section: Introductionmentioning

confidence: 99%

Generative Adversarial Learning of Protein Tertiary Structures

Rahman

Zhao

et al. 2021

Molecules

Self Cite

View full text Add to dashboard Cite

Protein molecules are inherently dynamic and modulate their interactions with different molecular partners by accessing different tertiary structures under physiological conditions. Elucidating such structures remains challenging. Current momentum in deep learning and the powerful performance of generative adversarial networks (GANs) in complex domains, such as computer vision, inspires us to investigate GANs on their ability to generate physically-realistic protein tertiary structures. The analysis presented here shows that several GAN models fail to capture complex, distal structural patterns present in protein tertiary structures. The study additionally reveals that mechanisms touted as effective in stabilizing the training of a GAN model are not all effective, and that performance based on loss alone may be orthogonal to performance based on the quality of generated datasets. A novel contribution in this study is the demonstration that Wasserstein GAN strikes a good balance and manages to capture both local and distal patterns, thus presenting a first step towards more powerful deep generative models for exploring a possibly very diverse set of structures supporting diverse activities of a protein molecule in the cell.

show abstract

Variational Autoencoders for Protein Structure Prediction

Cited by 10 publications

References 25 publications

Explore Protein Conformational Space With Variational Autoencoder

Explore Protein Conformational Space With Variational Autoencoder

Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures

Generative Adversarial Learning of Protein Tertiary Structures

Contact Info

Product

Resources

About