2014
DOI: 10.1371/journal.pone.0092721
|View full text |Cite
|
Sign up to set email alerts
|

Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners

Abstract: In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covarian… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

3
191
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 143 publications
(194 citation statements)
references
References 48 publications
3
191
0
Order By: Relevance
“…Our approach is based on pairwise maximum entropy models, which have proved successful at predicting residue contacts between known interaction partners (7,(15)(16)(17)(18)(19). To our knowledge, the important problem of predicting interaction partners among paralogs from sequences has only been addressed by Burger and van Nimwegen (6), who used a Bayesian network method.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Our approach is based on pairwise maximum entropy models, which have proved successful at predicting residue contacts between known interaction partners (7,(15)(16)(17)(18)(19). To our knowledge, the important problem of predicting interaction partners among paralogs from sequences has only been addressed by Burger and van Nimwegen (6), who used a Bayesian network method.…”
Section: Discussionmentioning
confidence: 99%
“…The ability to accurately predict interaction partners without training data is surprising; to understand it, we examine the evolution of the model over iterations of the IPA. In a well-trained model, the residue pairs with the largest couplings have been shown to correspond to contacts in the protein complex (7,16,31). Up to iteration ∼ 100 − 150 (with N increment = 6), models starting from random pairings do no better than chance at identifying contacts.…”
Section: Ipamentioning
confidence: 99%
See 1 more Smart Citation
“…The basis of the paralog matching procedure is the GaussDCA formulated in ref. 28. Let us assume a matched MSA A of M sequences of length L. The MSA is transformed into an M × 20L-dimensional binary array X by replacing each amino acid with a distinct 20-dimensional vector containing one entry "1" and 19 entries "0"; gaps are represented by zero vectors.…”
Section: Methodsmentioning
confidence: 99%
“…for residue contact prediction simply takes the L 2 norm of the 20 × 20-dimensional vector w ij with components w ij (a, b) [3,10,11,31,50],…”
mentioning
confidence: 99%