In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.
Understanding protein−protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein−protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue−residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue−residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.A lmost all biological processes depend on interacting proteins.Understanding protein−protein interactions is therefore key to our understanding of complex biological systems. In this context, at least two questions are of interest: First, the question "who with whom," i.e., which proteins interact; this concerns the networks connecting specific proteins inside one organism, but alsoin the context of this article-the evolutionary perspective of protein−protein interactions, which are conserved across different species. Their coevolution is at the basis of many modern computational techniques for characterizing protein−protein interactions. The second question is the question "how" proteins interact with each other, in particular, which residues are involved in the interaction interfaces, and which residues are in contact across the interfaces. Such knowledge may provide important mechanistic insight into questions related to interaction specificity or competitive interaction with partially shared interfaces.The experimental identification of protein−protein interactions is an arduous task (for reviews, cf. refs. 1 and 2): High-throughput techniques that aim to identify protein−protein interactions in vivo or in vitro are well documented and include large-scale yeast two-hybrid assays and protein affinity mass spectrometry assays. Such large-scale efforts have revealed useful information but are hampered by high false positive and false negative error rates. Structural approaches based on protein cocrystalli...
The mechanical unfolding of proteins is studied by extending the Wako-Saitô-Muñoz-Eaton model. This model is generalized by including an external force, and its thermodynamics turns out to be exactly solvable. We consider two molecules, the 27th immunoglobulin domain of titin and protein PIN1. We determine equilibrium force-extension curves for the titin and study the mechanical unfolding of this molecule, finding good agreement with experiments. By using an extended form of the Jarzynski equality, we compute the free energy landscape of the PIN1 as a function of the molecule length.
We develop a mean-field theory for the totally asymmetric simple exclusion process (TASEP) with open boundaries, in order to investigate the socalled dynamical transition. The latter phenomenon appears as a singularity in the relaxation rate of the system toward its non-equilibrium steady state. In the highdensity (low-density) phase, the relaxation rate becomes independent of the injection (extraction) rate, at a certain critical value of the parameter itself, and this transition is not accompanied by any qualitative change in the steady-state behavior. We characterize the relaxation rate by providing rigorous bounds, which become tight in the thermodynamic limit. These results are generalized to the TASEP with Langmuir kinetics, where particles can also bind to empty sites or unbind from occupied ones, in the symmetric case of equal binding/unbinding rates. The theory predicts a dynamical transition to occur in this case as well.PACS numbers: 02.50. Ga, 05.50.+q, 89.75.-k § Note however that a distinction of the HD phase into the subphases HD' and HD" (and of LD into LD' and LD") does not appear, because, in analogy with the pure TASEP, it cannot be detected by the mean-field theory alone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.