Specific protein−protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm's performance on histidine kinases and response regulators from bacterial twocomponent signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data. proteins. For instance, specific protein−protein interactions ensure proper signal transduction in various pathways. Hence, mapping specific protein−protein interactions is central to a systems-level understanding of cells, and has broad applications to areas such as drug targeting. High-throughput experiments have recently elucidated a substantial fraction of protein−protein interactions in a few model organisms (1), but such experiments remain challenging. Meanwhile, there has been an explosion of available sequence data. Can we exploit this abundant new sequence data to identify specific protein−protein interaction partners?Specific interactions between proteins imply evolutionary constraints on the interacting partners. For instance, mutation of a contact residue in one partner generally impairs binding, but may be compensated by a complementary mutation in the other partner. This coevolution of interaction partners results in correlations between their amino acid sequences. Similar correlations exist within single proteins, for example, between amino acids that are in contact in the folded protein. However, the simple fact of a correlation between residues in a multiple sequence alignment is only weakly predictive of a 3D contact (2-4), as correlation can also stem from indirect effects. Fortunately, global statistical models allow direct and indirect interactions to be disentangled (5-7). In particular, the maximum entropy principle (8) specifies the least-structured global statistical model consistent with the one-and two-point statistics of an alignment (5). This approach has recently been used with success to determine 3D ...
Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a priori knowledge of interaction partners, yielding a striking 0.93 true positive fraction on our complete dataset, and we uncover the origin of this surprising success. Finally, we discuss how our method could be used to predict novel protein-protein interactions. INTRODUCTIONMany key cellular processes are carried out by interacting proteins. For instance, transient protein-protein interactions determine signaling pathways, and their specificity ensures proper signal transduction. Hence, mapping specific protein-protein interactions is central to a systems-level understanding of cells, and has broad applications to areas such as drug targeting. High-throughput experimental methods have recently elucidated a substantial fraction of protein-protein interactions in a few model organisms [1], but experimental approaches remain challenging. Meanwhile, major progress in sequencing has led to an explosion of available sequence data. Can we exploit this abundant new sequence data to identify specific protein-protein interaction partners?Specific interactions between proteins imply evolutionary constraints on the interacting partners. For instance, mutation of a contact residue in one partner generically impairs binding, but may be compensated by a complementary mutation in the other partner. This co-evolution of interaction partners results in a correlation of their amino-acid sequences. Similar correlations exist within single proteins, between amino acids that are in contact in the folded protein. However, the simple fact of a correlation between residues in a multiple sequence alignment is only weakly predictive of a three-dimensional * anne-florence.bitbol@upmc.fr † ljc37@cam.ac.uk; L.J.C. and N.S.W. contributed equally to this work. ‡ wingreen@princeton.edu contact [2-4], as correlation can also stem from other effects such as phylogeny and indirect interactions. Fortunately, global statistical models provide a means to disentangle direct and indirect interactions [5][6][7]. In particular, the maximum entropy pri...
The essential outer membrane b-barrel protein BamA forms a complex with four lipoprotein partners BamBCDE that assembles b-barrel proteins into the outer membrane of Escherichia coli. Detailed genetic studies have shown that BamA cycles through multiple conformations during substrate assembly, suggesting that a complex network of residues may be involved in coordinating conformational changes and lipoprotein partner function. While genetic analysis of BamA has been informative, it has also been slow in the absence of a straightforward selection for mutants. Here we take a bioinformatic approach to identify candidate residues for mutagenesis using direct coupling analysis. Starting with the BamA paralog FhaC, we show that direct coupling analysis works well for large b-barrel proteins, identifying pairs of residues in close proximity in tertiary structure with a true positive rate of 0.64 over the top 50 predictions. To reduce the effects of noise, we designed and incorporated a novel structured prior into the empirical correlation matrix, dramatically increasing the FhaC true positive rate from 0.64 to 0.88 over the top 50 predictions. Our direct coupling analysis of BamA implicates residues R661 and D740 in a functional interaction. We find that the substitutions R661G and D740G each confer OM permeability defects and destabilize the BamA b-barrel. We also identify synthetic phenotypes and cross-suppressors that suggest R661 and D740 function in a similar process and may interact directly. We expect that the direct coupling analysis approach to informed mutagenesis will be particularly useful in systems lacking adequate selections and for dynamic proteins with multiple conformations.A S a Gram-negative bacterium, Escherichia coli is enveloped by two membranes, a cytoplasmic or inner membrane comprising a phospholipid bilayer and an outer membrane (OM) comprising an asymmetric bilayer with a phospholipid inner leaflet and a lipopolysaccharide outer leaflet (Kamio and Nikaido 1976;Silhavy et al. 2010;). An aqueous compartment called the periplasm separates the two membranes. Diffusion from the extracellular milieu into the periplasm is facilitated by b-barrel proteins embedded in the OM (OMPs) (Nikaido 2003). OMPs have additional structural and enzymatic functions (Tamm et al. 2004); however, all essential OMPs function in OM biogenesis.The folding and assembly of nascent OMPs is catalyzed by the b-barrel assembly machine (Bam) complex at the OM. The Bam complex is composed of BamA, itself an OMP, and four associated lipoproteins, BamBCDE (Wu et al. 2005;Sklar et al. 2007a). BamA is thought to be the central complex member. It contains five periplasmic polypeptide transport associated (POTRA) domains, which scaffold the lipoproteins and likely interact with substrate (Kim et al. 2007). Its b-barrel domain contains an extended extracellular loop, loop 6 (L6), which can adopt proteasesensitive and -resistant conformations, indicating that BamA undergoes conformational changes during OMP assembly (Rigel et ...
b Targeted, translational LacZ fusions provided the initial support for the signal sequence hypothesis in prokaryotes and allowed for selection of the mutations that identified the Sec translocon. Many of these selections relied on the fact that expression of targeted, translational lacZ fusions like malE-lacZ and lamB-lacZ42-1 causes lethal toxicity as folded LacZ jams the translocation pore. However, there is another class of targeted LacZ fusions that do not jam the translocon. These targeted, nonjamming fusions also show toxic phenotypes that may be useful for selecting mutations in genes involved in posttranslocational protein folding and targeting; however, they have not been investigated to the same extent as their jamming counterparts. In fact, it is still unclear whether LacZ can be fully translocated in these fusions. It may be that they simply partition into the inner membrane where they can no longer participate in folding or assembly. In the present study, we systematically characterize the nonjamming fusions and determine their ultimate localization. We report that LacZ can be fully translocated into the periplasm, where it is toxic. We show that this toxicity is likely due to LacZ misfolding and that, in the absence of the periplasmic disulfide bond catalyst DsbA, LacZ folds in the periplasm. Using the novel phenotype of periplasmic -galactosidase activity, we show that the periplasmic chaperone FkpA contributes to LacZ folding in this nonnative compartment. We propose that targeted, nonjamming LacZ fusions may be used to further study folding and targeting in the periplasm of Escherichia coli.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.