2021
DOI: 10.1186/s12859-021-04441-9
|View full text |Cite
|
Sign up to set email alerts
|

adabmDCA: adaptive Boltzmann machine learning for biological sequences

Abstract: Background Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy o… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(20 citation statements)
references
References 32 publications
2
18
0
Order By: Relevance
“…A protocol solely based on PSSMs, which neglects inter-dependencies between positions would yield a much lower success rate, as evidenced by (i) entropy calculations, which show that explicitly modeling inter-dependencies reduces the size of the space of candidate sequences by a factor of at least ~4000 (Methods) (ii) flexible docking simulations, which show that PSSM-generated sequences have worse docking energy than cRBM-generated ones on average and (iii) the microarray experiment, which did not identify any PSSM-generated sequence with high binding affinity to Cn. On the other hand, generative Potts models [ 64 , 26 , 65 ], which also integrate inter-dependencies, model equally well the distribution of natural binders ( Methods , S3A Fig ), and would likely perform comparably. Other machine learning generative models that account for inter-dependencies (see Wu.…”
Section: Discussionmentioning
confidence: 99%
“…A protocol solely based on PSSMs, which neglects inter-dependencies between positions would yield a much lower success rate, as evidenced by (i) entropy calculations, which show that explicitly modeling inter-dependencies reduces the size of the space of candidate sequences by a factor of at least ~4000 (Methods) (ii) flexible docking simulations, which show that PSSM-generated sequences have worse docking energy than cRBM-generated ones on average and (iii) the microarray experiment, which did not identify any PSSM-generated sequence with high binding affinity to Cn. On the other hand, generative Potts models [ 64 , 26 , 65 ], which also integrate inter-dependencies, model equally well the distribution of natural binders ( Methods , S3A Fig ), and would likely perform comparably. Other machine learning generative models that account for inter-dependencies (see Wu.…”
Section: Discussionmentioning
confidence: 99%
“…We use GENERALIST to model sequence variability in proteins that span multiple kingdoms of life, alignment sizes, and sequence lengths. We compare the performance of GENERALIST with three other generative models, the Potts model (referred to as adabmDCA 9 ), the autoregressive . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.…”
Section: Introductionmentioning
confidence: 99%
“…However, there are significant issues with the Potts model. The associated numerical inference is computationally inefficient 9 , limiting their application to small proteins and protein domains ‫ܮ(‬ ‫‬ 100 residues). In comparison, median protein size in many organisms including humans is much larger ‫(‬ 350 residues) 10 .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The prior parameters (both for gaps and insertions) are extracted from the seed alignment in an unsupervised manner. Finally, to further speed-up the learning of the seed-based objective function, we obtain the parameters of the DCA model using pseudo-likelihood maximization [4] instead of Boltzmann Machine Learning [5,8]. DCAlign, is a computational pipeline that allow to compute the seed-model parameters in a few minutes, contrary to its original implementation which required at least a day of computation in the best scenario.…”
Section: Introductionmentioning
confidence: 99%