Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools.
Mechanisms of DNA repair and mutagenesis are defined on the basis of relatively few proteins acting on DNA, yet the identities and functions of all proteins required are unknown. Here, we identify the network that underlies mutagenic repair of DNA breaks in stressed Escherichia coli and define functions for much of it. Using a comprehensive screen, we identified a network of ≥93 genes that function in mutation. Most operate upstream of activation of three required stress responses (RpoS, RpoE, and SOS, key network hubs), apparently sensing stress. The results reveal how a network integrates mutagenic repair into the biology of the cell, show specific pathways of environmental sensing, demonstrate the centrality of stress responses, and imply that these responses are attractive as potential drug targets for blocking the evolution of pathogens.
Summary
A central problem in biology is to identify gene function. One approach is to infer function in large supergenomic networks of interactions and ancestral relationships among genes; however, their analysis can be computationally prohibitive. We show here that these biological networks are compressible. They can be shrunk dramatically by eliminating redundant evolutionary relationships and this process is efficient because in these networks the number of compressible elements rises linearly rather than exponentially as in other complex networks. Compression enables global network analysis to computationally harness hundreds of interconnected genomes and to produce functional predictions. As a demonstration, we show that the essential, but functionally uncharacterized Plasmodium falciparum antigen EXP1 is a membrane glutathione S-transferase. EXP1 efficiently degrades cytotoxic hematin, is potently inhibited by artesunate, and is associated with artesunate metabolism and susceptibility in drug-pressured malaria parasites. These data implicate EXP1 in the mode of action of a frontline antimalarial drug.
Background: Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates -structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.