2011
DOI: 10.1002/prot.22934
|View full text |Cite
|
Sign up to set email alerts
|

Learning generative models for protein fold families

Abstract: Statistical models of the amino acid composition of the proteins within a fold family are widely used in science and engineering. Existing techniques for learning probabilistic graphical models from multiple sequence alignments either make strong assumptions about the conditional independencies within the model (e.g., HMMs), or else use sub-optimal algorithms to learn the structure and parameters of the model. We introduce an approach to learning the topological structure and parameters of an undirected probab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

2
411
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 327 publications
(413 citation statements)
references
References 39 publications
2
411
0
Order By: Relevance
“…Previous studies have developed a number of techniques to do this (Mézard and Mora 2009;Weigt et al 2009;Balakrishnan et al 2011;Cocco and Monasson 2011;Morcos et al 2011;Haq et al 2012;Jones et al 2012;Ekeberg et al 2013;Ferguson et al 2013;Barton et al 2016a). Following Ferguson et al (2013), we estimate the bivariate marginals given a set of fields and couplings by generating sequences through Markov Chain Monte Carlo (MCMC) where the Metropolis criterion for a generated sequence is proportional to the exponentiated Potts Hamiltonian.…”
Section: Model Inferencementioning
confidence: 99%
“…Previous studies have developed a number of techniques to do this (Mézard and Mora 2009;Weigt et al 2009;Balakrishnan et al 2011;Cocco and Monasson 2011;Morcos et al 2011;Haq et al 2012;Jones et al 2012;Ekeberg et al 2013;Ferguson et al 2013;Barton et al 2016a). Following Ferguson et al (2013), we estimate the bivariate marginals given a set of fields and couplings by generating sequences through Markov Chain Monte Carlo (MCMC) where the Metropolis criterion for a generated sequence is proportional to the exponentiated Potts Hamiltonian.…”
Section: Model Inferencementioning
confidence: 99%
“…The most popular approximation is to maximize the pseudo-likelihood 100 instead of the likelihood, as it can be shown that it converges to the same solution for large numbers of samples and it is fast to compute [2,10,50]. Even though pseudolikelihood maximization gives results of the same quality of predicted residue-residue contacts as those using the 105 full likelihood optimization, several studies unveiled that the pseudo-likelihood model is inaccurate and not able to accurately reproduce the empirical alignment statistics [7,12].…”
mentioning
confidence: 99%
“…The approach that has consistently been found to work best for residue contact prediction is the pseudo-likelihood approximation, in which we replace the likelihood with the pseudolikelihood and maximize the regularized log pseudolikelihood [2,10,50],…”
mentioning
confidence: 99%
“…In contrast to more traditional approaches based on homology detection and sequence conservation, contact prediction supported by residue coevolution (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25) makes use of sequence variability as an alternative source of information (26). The analysis of residue coevolution has been successfully applied to contact prediction at the interface of protein dimers (27)(28)(29)(30)(31)(32)(33), eventually leading to de novo prediction of protein complexes assisted by coevolution (29,30).…”
mentioning
confidence: 99%