2015
DOI: 10.1101/028936
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models

Abstract: Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effe… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
36
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(38 citation statements)
references
References 39 publications
(104 reference statements)
2
36
0
Order By: Relevance
“…Here, B is the number of sequences in the MSA, δ(a, b) = 1 if states a and b are the same, and δ(a, b) = 0 otherwise. To solve the inference problem, we use the adaptive cluster expansion (ACE) and Boltzmann machine learning algorithms developed by Barton et al, which avoid over-fitting by constructing a sparse network of interactions sufficient to reproduce the observed frequencies to within errors due to finite sampling [15] (see Jacquin et al [16] for a comparison different methods).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, B is the number of sequences in the MSA, δ(a, b) = 1 if states a and b are the same, and δ(a, b) = 0 otherwise. To solve the inference problem, we use the adaptive cluster expansion (ACE) and Boltzmann machine learning algorithms developed by Barton et al, which avoid over-fitting by constructing a sparse network of interactions sufficient to reproduce the observed frequencies to within errors due to finite sampling [15] (see Jacquin et al [16] for a comparison different methods).…”
Section: Methodsmentioning
confidence: 99%
“…Explicit sampling of fitness and epistatic effects by mutagenesis is costly, and is limited to proteins that can be expressed and evolved in the laboratory. However, for certain proteins [14], the number of known sequences is sufficient to infer the fitness effects of mutations by machine learning methods -for example, by adjusting the parameters of a Potts spin model to recover the frequencies of amino acids in a protein alignment [14][15][16][17].…”
Section: Introductionmentioning
confidence: 99%
“…We calculate the interaction energy for every possible protein pair within each species by summing the interprotein couplings assigned by the model. Such "energies" capture evolutionary correlations, and correlate to physical energies for lattice proteins (30). Using these interaction energies, we predict protein pairs [assuming one-to-one specific HK−RR interactions (28), Fig.…”
Section: Ipamentioning
confidence: 99%
“…Recently, probabilistic models, called Potts models, have been used to assign scores to individual protein sequences which correlate with experimental measures of fitness (Haq et al 2012;Ferguson et al 2013;Mann et al 2014;Figliuzzi et al 2015;Hopf et al 2017). These advances build upon previous and ongoing work in which Potts models have been used to extract information from sequence data regarding tertiary and quaternary structure of protein families (Weigt et al 2009;Morcos et al 2011Morcos et al , 2014Marks et al 2012;Sulkowska et al 2012;Sutto et al 2015;Barton et al 2016a;Haldane et al 2016;Jacquin et al 2016) and sequencespecific quantitative predictions of viral protein stability and fitness (Haq et al 2012;Shekhar et al 2013;Barton et al 2016b;Butler et al 2016).…”
Section: Introductionmentioning
confidence: 99%