2013
DOI: 10.1038/nmeth.2340
|View full text |Cite
|
Sign up to set email alerts
|

A large-scale evaluation of computational protein function prediction

Abstract: Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

9
984
4
12

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 888 publications
(1,025 citation statements)
references
References 52 publications
9
984
4
12
Order By: Relevance
“…We applied our algorithm to the Automated protein Function Prediction (AFP) problem, a challenging and central problem in computational biology [53,50]. In this setting, nodes represent proteins and connections their pairwise relationships deriving from different sources of information, including gene co-expression, genetic and physical interactions, protein ontologies and phenotype annotations.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We applied our algorithm to the Automated protein Function Prediction (AFP) problem, a challenging and central problem in computational biology [53,50]. In this setting, nodes represent proteins and connections their pairwise relationships deriving from different sources of information, including gene co-expression, genetic and physical interactions, protein ontologies and phenotype annotations.…”
Section: Methodsmentioning
confidence: 99%
“…We compared our method with several state-of-the-art competitors: GBA, an algorithm based on the guilt-by-association principle [39]; GeneMANIA [44], the top method in the MouseFunc challenge [52]; MS-kNN [32], one of the top ranking algorithm in the recent CAFA challenge [53]; LP, a semisupervised label propagation algorithm based on Gaussian random fields, and its class mass normalized version LP-CMN [70]; RW, the classical random walk algorithm without restart with at most 1000 random walk steps [36]; COSNet, a recently proposed algorithm for label prediction in graph, which is explicitly designed to cope with the imbalance in the instance labeling [21]. Furthermore, to assess the improvements introduced by partitioning proteins into categories, we also apply a different version of HoMCat, in which all the proteins are considered belonging to the same category (named HoMCat-1c).…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…These include sequence similarity [27,28] Among these in-silico methods [52], the basic local alignment search tool (BLAST) [53] revealing protein functions based on excess sequence similarity [54] demonstrated great capacity and attracted substantial interests from the researchers of this field [55,56]. Apart from BLAST, the methods based on the machine learning algorithm (a specific type of artificial intelligence) were frequently used in recent years to predict protein function [57][58][59][60][61][62], and various types of software together with several web-based tools integrating these methods were developed to predict the protein function from sequences irrespective of sequence or structural similarity [36,63].…”
Section: Introductionmentioning
confidence: 99%
“…ML approaches are the state of the art in most non-classic prediction challenges. These methods are applied in community annotation challenges such as Critical Assessment of protein Function Annotation (CAFA) (5,6), and Critical Assessment for Information Extraction in Biology (BioCreAtIvE) (7). ML approaches actually benefit from the growth of available sequences, while 'brittle' rulebased methods often fail to cope with the growing variability and quantity of possible annotations and sequences.…”
Section: Introductionmentioning
confidence: 99%