2020
DOI: 10.1101/2020.11.29.402875
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The structure-fitness landscape of pairwise relations in generative sequence models

Abstract: If disentangled properly, patterns distilled from evolutionarily related sequences of a given protein family can inform their traits - such as their structure and function. Recent years have seen an increase in the complexity of generative models towards capturing these patterns; from sitewise to pairwise to deep and variational. In this study we evaluate the degree of structure and fitness patterns learned by a suite of progressively complex models. We introduce pairwise saliency, a novel method for evaluatin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 47 publications
3
7
0
Order By: Relevance
“…The performance of the IS model is consistent with previous studies where independent effect terms explain a large fraction of trait variability (Otwinowski et al, 2018). Furthermore, our VAE model can be thought of as a mixture model, with each set of initial coordinates emitting a single sequence profile, within which additive effects explain the majority of variation (Dauparas et al, 2019;Marshall et al).…”
Section: Discussionsupporting
confidence: 87%
“…The performance of the IS model is consistent with previous studies where independent effect terms explain a large fraction of trait variability (Otwinowski et al, 2018). Furthermore, our VAE model can be thought of as a mixture model, with each set of initial coordinates emitting a single sequence profile, within which additive effects explain the majority of variation (Dauparas et al, 2019;Marshall et al).…”
Section: Discussionsupporting
confidence: 87%
“…In Ref. [13], for example, the authors test many different hyperparameters for variational autoencoders, which might also have an influence on how well the resulting distributions are approximated by pairwise models.…”
Section: Discussionmentioning
confidence: 99%
“…These model optimize an approximation of the pseudo-likelihood function called self-supervision or masked-language-modelling (31). We suspect the Jacobian of these models can be computed (8) and regularized with LH to promote sparsity in the hidden representations.…”
Section: R a F Tmentioning
confidence: 99%
“…The parameters of these model have been inferred using a plethora of methods such as GREM-LIN (3), plmDCA (4), bmDCA (5), PSICOV (6) and mfDCA (7). This also includes the most recent low-rank reparametizations such as restricted Boltzmann machines or variational autoencoder(8), and self-attention-based models that share MRF parameters across protein families(9). The parameters from these models are used for protein structure prediction (1014), protein-protein interaction prediction (1517), protein design(1820), mutation effect prediction(21, 22), and protein sequences alignment and homology search (2326).…”
Section: Introductionmentioning
confidence: 99%