Determining protein functions from genomic sequences is a central goal of bioinformatics. We present a method based on the assumption that proteins that function together in a pathway or structural complex are likely to evolve in a correlated fashion. During evolution, all such functionally linked proteins tend to be either preserved or eliminated in a new species. We describe this property of correlated evolution by characterizing each protein by its phylogenetic profile, a string that encodes the presence or absence of a protein in every known genome. We show that proteins having matching or similar profiles strongly tend to be functionally linked. This method of phylogenetic profiling allows us to predict the function of uncharacterized proteins.The fully sequenced genomes of numerous organisms offer large amounts of information about cellular biology (see the genomes listed at the web site of The Institute for Genome Research: www.tigr.org). It is a central challenge of bioinformatics to use this information in discovering the function of proteins. Functional assignments of genes come primarily from biochemical experimentation, which can be extended by matching recently sequenced proteins to those that have already been characterized (1). For the exceptionally well studied genome of Escherichia coli (2), these and related techniques (3, 4) have lead to tentative functional assignments of slightly more than half of its proteins (5). The problem of assigning functions to the remaining proteins is addressed here.Our computational method detects proteins that participate in a common structural complex or metabolic pathway. Proteins within these groups are defined as functionally linked. The underlying hypothesis is that functionally linked proteins evolve in a correlated fashion, and, therefore, they have homologs in the same subset of organisms. For instance, we expect to find flagellar proteins in bacteria that possess flagella but not in other organisms. In short, we show that if two proteins have homologs in the same subset of fully sequenced organisms, they are likely to be functionally linked. We exploit this property systematically to map links between all the proteins coded by a genome. In general, pairs of functionally linked proteins have no amino acid sequence similarity with each other and, therefore, cannot be linked by conventional sequence-alignment techniques. METHODSTo represent the subset of organisms that contain a homolog, we constructed a phylogenetic profile for each protein. This profile is a string with n entries, each one bit, where n corresponds to the number of genomes (16 in the present article). We indicate the presence of a homolog to a given protein in the nth genome with an entry of unity at the nth position. If no homolog is found, the entry is zero. Proteins are clustered according to the similarity of their phylogenetic profiles. Similar profiles show a correlated pattern of inheritance and, by implication, functional linkage. The method predicts that the functions of u...
A computational method is proposed for inferring protein interactions from genome sequences on the basis of the observation that some pairs of interacting proteins have homologs in another organism fused into a single protein chain. Searching sequences from many genomes revealed 6809 such putative proteinprotein interactions in Escherichia coli and 45,502 in yeast. Many members of these pairs were confirmed as functionally related; computational filtering further enriches for interactions. Some proteins have links to several other proteins; these coupled links appear to represent functional interactions such as complexes or pathways. Experimentally confirmed interacting pairs are documented in a Database of Interacting Proteins.The lives of biological cells are controlled by interacting proteins in metabolic and signaling pathways and in complexes such as the molecular machines that synthesize and use adenosine triphosphate (ATP), replicate and translate genes, or build up the cytoskeletal infrastructure (1). Our knowledge of proteinprotein interactions has been accumulated from biochemical and genetic experiments, including the widely used yeast two-hybrid test (2). Here we ask if protein-protein interactions can be recognized from genome sequences by purely computational means.Some interacting proteins such as the Gyr A and Gyr B subunits of Escherichia coli DNA gyrase are fused into a single chain in another organism, in this case the topoisomerase II of yeast (3). Thus, the sequence similarities of Gyr A (804 amino acid residues) and Gyr B (875 residues) to different segments of the topoisomerase II (1429 residues) might be used to predict that Gyr A and Gyr B interact in E. coli.To find other such putative protein interactions in E. coli, we searched the 4290 protein sequences of the E. coli genome (4) for these patterns of sequence homology (5). We found 6809 pairs of nonhomologous sequences, both members of the pair having significant similarity (6) to a single protein in some other genome that we term a Rosetta Stone sequence because it deciphers the interaction between the protein pairs. The 4290 proteins could form at most (4290) 2 /2 ϭ 9 ϫ 10 6 pair interactions, but we would expect many fewer interactions in a functioning cell; roughly 2 to 10 interactions for each protein does not seem unreasonably many. Each of these 6809 pairs is a candidate for a pair of interacting proteins in E. coli. Five such candidates are shown in Fig. 1. The first three pairs of E. coli proteins were among those easily determined from the biochemical literature in fact to interact. The final two pairs of proteins are not known to interact. They are representatives of many such pairs whose putative interactions at this time must be taken as testable hypotheses.We devised three independent tests of interactions predicted by the method we term domain fusion analysis, each showing that a reasonable fraction may in fact interact. The first method uses the annotation of proteins given in the SWISS-PROT database (7). For cases wh...
The world's crop productivity is stagnating whereas population growth, rising affluence, and mandates for biofuels put increasing demands on agriculture. Meanwhile, demand for increasing cropland competes with equally crucial global sustainability and environmental protection needs. Addressing this looming agricultural crisis will be one of our greatest scientific challenges in the coming decades, and success will require substantial improvements at many levels. We assert that increasing the efficiency and productivity of photosynthesis in crop plants will be essential if this grand challenge is to be met. Here, we explore an array of prospective redesigns of plant systems at various scales, all aimed at increasing crop yields through improved photosynthetic efficiency and performance. Prospects range from straightforward alterations, already supported by preliminary evidence of feasibility, to substantial redesigns that are currently only conceptual, but that may be enabled by new developments in synthetic biology. Although some proposed redesigns are certain to face obstacles that will require alternate routes, the efforts should lead to new discoveries and technical advances with important impacts on the global problem of crop productivity and bioenergy production.light capture/conversion | carbon capture/conversion | smart canopy | enabling plant biotechnology tools | sustainable crop production Increasing demands for global food production over the next several decades portend a huge burden on the world's shrinking farmlands. Increasing global affluence, population growth, and demands for a bioeconomy (including livestock feed, bioenergy, chemical feedstocks, and biopharmaceuticals) will all require increased agricultural productivity, perhaps by as much as 60-120% over 2005 levels (e.g., refs. 1 and 2), putting increased productivity on a collision course with environmental and sustainability goals (3). The 45 y from 1960 to 2005 saw global food production grow ∼160%, mostly (135%) by improved production on
Nature provides many examples of self- and co-assembling protein-based molecular machines, including icosahedral protein cages that serve as scaffolds, enzymes, and compartments for essential biochemical reactions and icosahedral virus capsids, which encapsidate and protect viral genomes and mediate entry into host cells. Inspired by these natural materials, we report the computational design and experimental characterization of co-assembling two-component 120-subunit icosahedral protein nanostructures with molecular weights (1.8–2.8 MDa) and dimensions (24–40 nm diameter) comparable to small viral capsids. Electron microscopy, SAXS, and X-ray crystallography show that ten designs spanning three distinct icosahedral architectures form materials closely matching the design models. In vitro assembly of independently purified components reveals rapid assembly rates comparable to viral capsids and enables controlled packaging of molecular cargo via charge complementarity. The ability to design megadalton-scale materials with atomic-level accuracy and controllable assembly opens the door to a new generation of genetically programmable protein-based molecular machines.
The self-assembly of proteins into highly ordered nanoscale architectures is a hallmark of biological systems. The sophisticated functions of these molecular machines inspire the development of methods to engineer novel self-assembling protein structures. Although there has been exciting recent progress in this area, designing multi-component protein nanomaterials with high accuracy remains an outstanding challenge. Here we address this challenge by developing a general computational method for designing protein nanomaterials in which two distinct types of subunits coassemble to a target symmetric architecture. We use the method to design five novel 24-subunit cage-like protein nanomaterials in two distinct symmetric architectures, and experimentally demonstrate that the structures of the materials are in close agreement with the computational design models. The accuracy of the method and the universe of two-component materials that it makes accessible pave the way for the construction of functional protein nanomaterials tailored to specific applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.