Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/.
The impetus for the recent development and emergence of automated function prediction methods is an exponentially growing flood of new experimental data, the interpretation of which is hindered by a shortage of reliable annotations for proteins that lack experimental characterization or significant homologs in current databases. Here we introduce PFP, an automated function prediction server that provides the most probable annotations for a query sequence in each of the three branches of the Gene Ontology: biological process, molecular function, and cellular component. Rather than utilizing precise pattern matching to identify functional motifs in the sequences and structures of these proteins, we designed PFP to increase the coverage of function annotation by lowering resolution of predictions when a detailed function is not predictable. To do this we extend a traditional PSI-BLAST search by extracting and scoring annotations (GO terms) individually, including annotations from distantly related sequences, and applying a novel data mining tool, the Function Association Matrix, to score strongly associated pairs of annotations. We show that PFP can correctly assign function using only weakly similar sequences with a significantly better accuracy and coverage than a standard PSI-BLAST search, improving it more than fivefold. The most descriptive annotations predicted by PFP (GO depth $8) can identify a significant subgraph in the GO with >60% accuracy and ;100% coverage for our benchmark set. We also provide examples of the superb performance of PFP in an assessment of automated function prediction servers at the Automated Function Prediction Special Interest Group meeting at ISMB 2005 (AFP-SIG '05). Keywords: protein function prediction; PSI-BLAST; gene ontology; low-resolution functionThe fields of cell and molecular biology have as a main focus the task of clearly defining cellular roles for all proteins encoded by the DNA existing in a genome. This involves describing for each protein its biochemical function(s), cellular location(s), participation in various cellular processes, structure, interactions, etc. Recently developed technologies for molecular biology are increasingly broad, both in scope and scale. The bioinformatics community has been called upon to extract and interpret patterns in the glut of new experimental data produced by these technologies, so that they may be utilized to their full capacity.Automated protein function prediction methods are emerging as both interpretive techniques for highthroughput experimental datasets (e.g., expression microarrays, interaction screens) and as partners to structural genomics projects (Watson et al. 2005). These algorithms Reprint requests to: Diasuke Kihara, Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; e-mail: dkihara@purdue.edu; fax: (765) 496-1189.Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi
It is essential to improve therapies for controlling excessive bleeding in patients with haemorrhagic disorders. As activated blood platelets mediate the primary response to vascular injury, we hypothesize that storage of coagulation Factor VIII within platelets may provide a locally inducible treatment to maintain haemostasis for haemophilia A. Here we show that haematopoietic stem cell gene therapy can prevent the occurrence of severe bleeding episodes in dogs with haemophilia A for at least 2.5 years after transplantation. We employ a clinically relevant strategy based on a lentiviral vector encoding the ITGA2B gene promoter, which drives platelet-specific expression of human FVIII permitting storage and release of FVIII from activated platelets. One animal receives a hybrid molecule of FVIII fused to the von Willebrand Factor propeptide-D2 domain that traffics FVIII more effectively into α-granules. The absence of inhibitory antibodies to platelet-derived FVIII indicates that this approach may have benefit in patients who reject FVIII replacement therapies. Thus, platelet FVIII may provide effective long-term control of bleeding in patients with haemophilia A.
ESG web server is available for automated protein function prediction at http://dragon.bio.purdue.edu/ESG/.
Function prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational function prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein function prediction. Here, first we review the definition of protein function. Then the recent developments of these methods are introduced with special focus on the type of predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.