BackgroundA major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.ResultsWe conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.ConclusionsThe top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-1037-6) contains supplementary material, which is available to authorized users.
The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new ‘phylogenetic annotation’ process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.
Plant and algal prolyl 4-hydroxylases (P4Hs) are key enzymes in the synthesis of cell wall components. These monomeric enzymes belong to the 2-oxoglutarate dependent superfamily of enzymes characterized by a conserved jelly-roll framework. This algal P4H has high sequence similarity to the catalytic domain of the vertebrate, tetrameric collagen P4Hs, whereas there are distinct sequence differences with the oxygen-sensing hypoxia-inducible factor P4H subfamily of enzymes. We present here a 1.98-Å crystal structure of the algal Chlamydomonas reinhardtii P4H-1 complexed with Zn 2؉ and a proline-rich (SerPro) 5 substrate. This ternary complex captures the competent mode of binding of the peptide substrate, being bound in a lefthanded (poly)L-proline type II conformation in a tunnel shaped by two loops. These two loops are mostly disordered in the absence of the substrate. The importance of these loops for the function is confirmed by extensive mutagenesis, followed up by enzyme kinetic characterizations. These loops cover the central Ser-Pro-Ser tripeptide of the substrate such that the hydroxylation occurs in a highly buried space. This novel mode of binding does not depend on stacking interactions of the proline side chains with aromatic residues. Major conformational changes of the two peptide binding loops are predicted to be a key feature of the catalytic cycle. These conformational changes are probably triggered by the conformational switch of Tyr 140 , as induced by the hydroxylation of the proline residue. The importance of these findings for understanding the specific binding and hydroxylation of (X-Pro-Gly) n sequences by collagen P4Hs is also discussed.
Prolyl 4-hydroxylases (P4Hs) are 2-oxoglutarate dioxygenases that catalyze the hydroxylation of peptidyl prolines. They play an important role in collagen synthesis, oxygen homeostasis, and plant cell wall formation. We describe four structures of a P4H from the green alga Chlamydomonas reinhardtii, two of the apoenzyme at 1.93 and 2.90 Å resolution, one complexed with the competitive inhibitor Zn 2؉ , and one with Zn 2؉ and pyridine 2,4-dicarboxylate (which is an analogue of 2-oxoglutarate) at 1.85 Å resolution. The structures reveal the double-stranded -helix core fold (jellyroll motif), typical for 2-oxoglutarate dioxygenases. The catalytic site is at the center of an extended shallow groove lined by two flexible loops. Mutagenesis studies together with the crystallographic data indicate that this groove participates in the binding of the proline-rich peptide-substrates. It is discussed that the algal P4H and the catalytic domain of collagen P4Hs have notable structural similarities, suggesting that these enzymes form a separate structural subgroup of P4Hs different from the hypoxia-inducible factor P4Hs. Key structural differences between these two subgroups are described. These studies provide first insight into the structure-function relationships of the collagen P4Hs, which unlike the hypoxiainducible factor P4Hs use proline-rich peptides as their substrates.
Collagen prolyl 4-hydroxylases (C-P4Hs) catalyze the formation of 4-hydroxyproline by the hydroxylation of -X-Pro-Gly-triplets. The vertebrate enzymes are ␣ 2  2 tetramers, the -subunit being identical to protein-disulfide isomerase (PDI). Two isoforms of the catalytic ␣-subunit, which combine with PDI to form [␣(I)] 2  2 and [␣(II)] 2  2 tetramers, have been known up to now. We report here on the cloning and characterization of a third vertebrate C-P4H ␣-subunit isoform, ␣(III). The processed human, rat and mouse ␣(III) polypeptides consist of 520 -525 residues, all three having signal peptides of 19 -22 additional residues. The sequence of the processed human ␣(III) polypeptide is 35-37% identical to those of human ␣(I) and ␣(II), the highest identity being found within the catalytically important C-terminal region and all five critical residues at the cosubstrate binding sites being conserved. The sequence within a region corresponding to the peptide-substrate binding domain is less conserved, but all five ␣ helices constituting this domain can be predicted to be located in identical positions in ␣(I), ␣(II), and ␣(III) and to have essentially identical lengths. The ␣(III) mRNA is expressed in many human tissues, but at much lower levels than the ␣(I) and ␣(II) mRNAs. In contrast to ␣(I) and ␣(II), no evidence was found for alternative splicing of the ␣(III) transcripts. Coexpression of a recombinant human ␣(III) polypeptide with PDI in human embryonic kidney cells led to the formation of an active enzyme that hydroxylated collagen chains and a collagen-like peptide and appeared to be an [␣(III)] 2  2 tetramer. The catalytic properties of the recombinant enzyme were very similar to those of the type I and II C-P4Hs, with the exception that its peptide binding properties were intermediate between those of the type I and type II enzymes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.