Protein 3D Structure Computed from Evolutionary Sequence Variation

Marks, Debora S.; Colwell, Lucy; Sheridan, Robert P.; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

doi:10.1371/journal.pone.0028766

Cited by 1,083 publications

(1,503 citation statements)

References 79 publications

Supporting

Mentioning

1,483

Contrasting

Order By: Relevance

“…A total of 220 folded models for T. thermophilus RodA were generated for increasing numbers of EC restraints with using the folding protocol in EVfold 9 which itself uses a distance geometry and simulated annealing protocol in CNS 32,33 . All models were ranked as described in previously 11,34 and the 50 top-ranked models from each of the MSAs were used as as molecular replacement search models ( vide infra ). The full EVfold software package and ReadMe is available at https://github.com/debbiemarkslab/EVcouplings.…”

Section: Methodsmentioning

confidence: 99%

“…Recent methodological advances in molecular replacement have expanded the range of suitable templates 8 , and evolutionary co-variation analysis now allows for fold prediction even in the absence of prior structural data 9–11 . This approach exploits the fact that residues that interact with one another structurally tend to co-evolve to maintain their interactions.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Structure of the peptidoglycan polymerase RodA resolved by evolutionary coupling analysis

Sjodt

Brock

Dobihal

et al. 2018

Nature

Self Cite

120

139

View full text Add to dashboard Cite

The Shape, Elongation, Division, and Sporulation (“SEDS”) proteins are a large family of ubiquitous and essential transmembrane enzymes with critical roles in bacterial cell wall biology. The exact function of SEDS proteins was long enigmatic, but recent work1–3 has revealed that the prototypical SEDS family member RodA is a peptidoglycan polymerase – a role previously attributed exclusively to members of the penicillin binding protein family4. This discovery has made RodA and other SEDS proteins promising targets for the development of next-generation antibiotics. However, little is known regarding the molecular basis for SEDS activity, and no structural data are available for RodA or any homolog thereof. Here, we report the crystal structure of Thermus thermophilus RodA at a resolution of 2.9 Å, determined using evolutionary covariance-based fold prediction to enable molecular replacement. The structure reveals a novel ten-pass transmembrane fold with large extracellular loops, one of which is partially disordered. The protein contains a highly conserved cavity in the transmembrane domain, reminiscent of ligand binding sites in transmembrane receptors. Mutagenesis experiments in Bacillus subtilis and Escherichia coli show that perturbation of this cavity abolishes RodA function both in vitro and in vivo, indicating it is catalytically essential. These results provide a framework for understanding bacterial cell wall synthesis and SEDS protein function.

show abstract

Section: Methodsmentioning

confidence: 99%

mentioning

confidence: 99%

Structure of the peptidoglycan polymerase RodA resolved by evolutionary coupling analysis

Sjodt

Brock

Dobihal

et al. 2018

Nature

Self Cite

120

139

View full text Add to dashboard Cite

show abstract

“…HHblits has the potential to improve many downstream analysis and prediction methods, such as a de novo protein structure prediction method requiring large and accurate MSAs 16 . methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturemethods/.…”

mentioning

confidence: 99%

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

et al. 2011

View full text Add to dashboard Cite

are generally too slow for iteratively searching through large sequence databases such as UniProt or NCBI's nonredundant (nr) database. Here we present HMM-HMM-based lightningfast iterative sequence search (HHblits), which extends HHsearch to enable fast, iterative sequence searches. The profile-profile alignment prefilter of HHblits reduces the number of full HMM-HMM alignments from many millions to a few thousand, making it faster than PSI-BLAST but still as sensitive as HHsearch (Supplementary Fig. 1).For iterative searches, HHblits needs a database of HMMs that covers the entire sequence space. We devised a very fast method, kClust (M. Hauser, C.E. Mayer and J.S., unpublished data), for clustering large sequence databases down to 20-30% maximum pairwise sequence identity while requiring almost full-length alignability (>80% coverage of longer sequences). This strict coverage criterion enriches for orthologous sequences with the same domain architecture 7 : of the UniProt20 clusters containing more than two Swiss-Prot sequences with enzyme commission numbers, 98.4% had all four enzyme commission digits conserved ( Supplementary Fig. 2). kClust is sufficiently fast (~1,000 times faster than BLAST) to allow for regular reclustering of the updated UniProt and nr databases. UniProt20 (the version from July 2011) contained 15 million sequences in 2.6 million HMMs, with an average of 5.5 sequences per cluster.HHblits first converts the query sequence (or MSA) to an HMM. This is conventionally done by adding pseudocounts of amino acids that are physicochemically similar to the amino acid in the query. In contrast, HHblits calculates pseudocounts that depend on the local sequence context (that is, the 13 positions around each residue). This method had improved the sensitivity and alignment quality of the resulting profile considerably 8 . HHblits then searches the HMM database and adds the sequences from HMMs below a defined expected value (E value) threshold to the query MSA, from which the HMM for the next search iteration is built ( Fig. 1a and Supplementary Fig. 3). For speed and sensitivity, the prefilter is crucial. The key idea was to implement profile-profile comparison as a sequence-to-profile comparison by discretizing the vectors of 20 amino acid probabilities in each HMM column into an alphabet of 219 letters. Each letter represents a typical profile column ( Supplementary Fig. 4). We approximate the database HMMs by sequences over this extended alphabet, ignoring the insertion and deletion probabilities of the HMMs (Supplementary Fig. 5). Before prefiltering, we calculate the score of each query HMM column with each of the 219 letters, which results in a 219-row extended sequence profile. The prefiltering consists of two steps (Supplementary Fig. 3 Building protein multiple-sequence alignments (MSAs) by iterative sequence searches is of fundamental importance in computational biology, as MSAs are a key intermediate step in the sequence-based prediction of evolutionarily conserved properties, such as tert...

show abstract

“…The rms error captures the absolute difference between the true and inferred couplings but is unable to clearly distinguish whether the relative ordering of the couplings has been correctly inferred. This limitation is problematic since many practical applications, such as the prediction of protein contacts from MF inference on sequence data [4,5,34], rely on proper rank ordering of the inferred couplings rather than their absolute magnitude. Information about the correct rank ordering can be determined from the rank correlation between the true and inferred couplings.…”

Section: A Results For the Ising Modelmentioning

confidence: 99%

Large pseudocounts andL2-norm penalties are necessary for the mean-field inference of Ising and Potts models

et al. 2014

View full text Add to dashboard Cite

The mean-field (MF) approximation offers a simple, fast way to infer direct interactions between elements in a network of correlated variables, a common, computationally challenging problem with practical applications in fields ranging from physics and biology to the social sciences. However, MF methods achieve their best performance with strong regularization, well beyond Bayesian expectations, an empirical fact that is poorly understood. In this work, we study the influence of pseudocount and L 2 -norm regularization schemes on the quality of inferred Ising or Potts interaction networks from correlation data within the MF approximation. We argue, based on the analysis of small systems, that the optimal value of the regularization strength remains finite even if the sampling noise tends to zero, in order to correct for systematic biases introduced by the MF approximation. Our claim is corroborated by extensive numerical studies of diverse model systems and by the analytical study of the m-component spin model for large but finite m. Additionally, we find that pseudocount regularization is robust against sampling noise and often outperforms L 2 -norm regularization, particularly when the underlying network of interactions is strongly heterogeneous. Much better performances are generally obtained for the Ising model than for the Potts model, for which only couplings incoming onto medium-frequency symbols are reliably inferred.

show abstract

Protein 3D Structure Computed from Evolutionary Sequence Variation

Cited by 1,083 publications

References 79 publications

Structure of the peptidoglycan polymerase RodA resolved by evolutionary coupling analysis

Structure of the peptidoglycan polymerase RodA resolved by evolutionary coupling analysis

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

Large pseudocounts andL2-norm penalties are necessary for the mean-field inference of Ising and Potts models

Contact Info

Product

Resources

About