An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Jiang, Yuxiang; Oron, Tal; Clark, Wyatt T.; Bankapur, Asma; D’Andrea, Daniel; Lepore, Rosalba; Funk, Christopher; Kahanda, Indika; Verspoor, Karin; Ben‐Hur, Asa; Koo, Da Chen Emily; Penfold-Brown, Duncan; Shasha, Dennis; Youngs, Noah; Bonneau, Richard; Lin, Alexandra J.; Sahraeian, Sayed Mohammad Ebrahim; Martelli, Pier Luigi; Profiti, Giuseppe; Casadio, Rita; Cao, Renzhi; Zhong, Zhaolong; Cheng, Jianlin; Altenhoff, Adrian M.; Škunca, Nives; Dessimoz, Christophe; Doğan, Tunca; Hakala, Kai; Kaewphan, Suwisa; Mehryary, Farrokh; Salakoski, Tapio; Ginter, Filip; Fang, Hai; Smithers, Ben; Oates, Matt; Gough, Julian; Törönen, Petri; Koskinen, Patrik; Holm, Liisa; Chen, Ching-Tai; Hsu, Wen-Lian; Bryson, Kevin; Cozzetto, Domenico; Minneci, Federico; Jones, David T.; Chapman, Samuel; Bkc, Dukka; Khan, Ishita; Kihara, Daisuke; Ofer, Dan; Rappoport, Nadav; Stern, Amos; Cibrián–Uhalte, Elena; Denny, Paul; Foulger, Rebecca E.; Hieta, Reija; Legge, Duncan; Lovering, Ruth C.; Magrane, Michele; Melidoni, Anna N.; Mutowo, Prudence; Pichler, Klemens; Shypitsyna, Aleksandra; Li, Biao; Zakeri, Pooya; ElShal, Sarah; Tranchevent, Léon-Charles; Das, Sayoni; Dawson, Natalie L.; Lee, David; Lees, Jonathan G.; Sillitoe, Ian; Bhat, Prajwal; Nepusz, Tamás; Romero, Alfonso E.; Sasidharan, Rajkumar; Yang, Haixuan; Paccanaro, Alberto; Gillis, Jesse; Sedeño-Cortés, Adriana E.; Pavlidis, Paul; Feng, Shou; Cejuela, Juan Miguel; Goldberg, Tatyana; Hamp, Tobias; Richter, Lothar; Salamov, Asaf; Gabaldón, Toni; Marcet‐Houben, Marina; Supek, Fran; Gong, Qingtian; Ning, Wei; Zhou, Yuanpeng; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Toppo, Stefano; Ferrari, Carlo; Giollo, Manuel; Piovesan, Damiano; Tosatto, Silvio C. E.; Pozo, Ángela del; Fernández, José Marı́a; Maietta, Paolo; Valencia, Alfonso; Tress, Michael L.; Benso, Alfredo; Carlo, Stefano Di; Politano, Gianfranco; Savino, Alessandro; Rehman, Hafeez Ur; Ré, Matteo; Mesiti, Marco; Valentini, Giorgio; Bargsten, Joachim W.; Dijk, Aalt D. J. van; Gemović, Branislava; Glišić, Sanja; Perovic, Vladimir; Veljković, Veljko; Veljković, Nevena; Almeida-E-Silva, Danillo C; Vêncio, Ricardo Zorzetto Nicoliello; Sharan, Malvika; Vogel, Jörg; Kansakar, Lakesh; Zhang, Shanshan; Vučetić, Slobodan; Wang, Zheng; Sternberg, Michael J.E.; Wass, Mark N.; Huntley, Rachael P.; Martin, Maria Jesus; O′Donovan, Claire; Robinson, Peter N.; Moreau, Yves; Tramontano, Anna; Babbitt, Patricia C.; Brenner, Steven E.; Linial, Michal; Orengo, Christine A.; Rost, Burkhard; Greene, Casey S.; Mooney, Sean D.; Friedberg, Iddo; Radivojac, Predrag

doi:10.1186/s13059-016-1037-6

Cited by 371 publications

(494 citation statements)

References 19 publications

Supporting

Mentioning

478

Contrasting

Unclassified

Order By: Relevance

“…Functional annotations relating to musculoskeletal disease, especially OA, are poor and result in spurious descriptors. These important issues have been realized8 and methods to improve annotations are being developed 9…”

Section: Biology As a Systemmentioning

confidence: 99%

Systems approaches in osteoarthritis: Identifying routes to novel diagnostic and therapeutic strategies

Mueller

Peffers

Proctor³

et al. 2017

J. Orthop. Res.

View full text Add to dashboard Cite

Systems orientated research offers the possibility of identifying novel therapeutic targets and relevant diagnostic markers for complex diseases such as osteoarthritis. This review demonstrates that the osteoarthritis research community has been slow to incorporate systems orientated approaches into research studies, although a number of key studies reveal novel insights into the regulatory mechanisms that contribute both to joint tissue homeostasis and its dysfunction. The review introduces both top‐down and bottom‐up approaches employed in the study of osteoarthritis. A holistic and multiscale approach, where clinical measurements may predict dysregulation and progression of joint degeneration, should be a key objective in future research. The review concludes with suggestions for further research and emerging trends not least of which is the coupled development of diagnostic tests and therapeutics as part of a concerted effort by the osteoarthritis research community to meet clinical needs. © 2017 The Authors. Journal of Orthopaedic Research Published by Wiley Periodicals, Inc. on behalf of Orthopaedic Research Society. J Orthop Res 35:1573–1588, 2017.

show abstract

Section: Biology As a Systemmentioning

confidence: 99%

Systems approaches in osteoarthritis: Identifying routes to novel diagnostic and therapeutic strategies

Mueller

Peffers

Proctor³

et al. 2017

J. Orthop. Res.

View full text Add to dashboard Cite

show abstract

“…These algorithms are recognized as powerful alternative method for the functional prediction of both proteins [66][67][68][69][70] and other biomolecules [71]. However, over one third of the protein sequences in the UniProt [26] are still labeled as "putative", "uncharacterized", "unknown function" or "hypothetical", and the difficulty in discovering the functional class of the remaining proteins are reported to come from the false discovery rate of the in-silico methods [55,56,72]. Moreover, the identification accuracies of those approaches still need to be further improved [55,56,73].…”

Section: Introductionmentioning

confidence: 99%

“…However, over one third of the protein sequences in the UniProt [26] are still labeled as "putative", "uncharacterized", "unknown function" or "hypothetical", and the difficulty in discovering the functional class of the remaining proteins are reported to come from the false discovery rate of the in-silico methods [55,56,72]. Moreover, the identification accuracies of those approaches still need to be further improved [55,56,73]. Thus, it is urgently necessary to assess the identification accuracies and false discovery rates among those different in-silico approaches.…”

Section: Introductionmentioning

confidence: 99%

“…These include sequence similarity [27,28] Among these in-silico methods [52], the basic local alignment search tool (BLAST) [53] revealing protein functions based on excess sequence similarity [54] demonstrated great capacity and attracted substantial interests from the researchers of this field [55,56]. Apart from BLAST, the methods based on the machine learning algorithm (a specific type of artificial intelligence) were frequently used in recent years to predict protein function [57][58][59][60][61][62], and various types of software together with several web-based tools integrating these methods were developed to predict the protein function from sequences irrespective of sequence or structural similarity [36,63].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

Yu¹,

Li²,

Yang³

et al. 2017

Preprint

View full text Add to dashboard Cite

The knowledge of protein function is essential for the study of biological processes, the understanding of disease mechanism and the exploration of novel therapeutic target. Apart from experimental methods, a number of in-silico approaches have been developed and extensively used for protein function prediction.Among these approaches, BLAST predicts functions based on protein sequence similarity, and machine learning predicts functional families from protein sequences irrespective of their similarity, which complements BLAST and other methods in predicting diverse classes of proteins including distantly related proteins and homologous proteins of different functions. However, their identification accuracies and the false discovery rate have not yet been assessed so far, which greatly limits the usage of these prediction algorithms. Herein, a comprehensive comparison of the performances among four popular functional prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these algorithms were systematically assessed by four metrics (sensitivity, specificity, accuracy and Matthews correlation coefficient) based on the independent test datasets generated from 93 protein families defined by UniProtKB Keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model species (homo sapiens, arabidopsis thaliana, saccharomyces cerevisiae and mycobacterium tuberculosis). As a result, the substantially higher sensitivity and stability of BLAST and SVM were observed compared with that of PNN and KNN. But the machine learning algorithms (PNN, KNN and SVM) were found capable of significantly reducing the false discovery rate (SVM < PNN ≈ KNN). In summary, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research. KEYWORDSfalse discovery rate; machine learning; protein function prediction; support vector machine; BLAST Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted:

show abstract

“…ML approaches are the state of the art in most non-classic prediction challenges. These methods are applied in community annotation challenges such as Critical Assessment of protein Function Annotation (CAFA) (5,6), and Critical Assessment for Information Extraction in Biology (BioCreAtIvE) (7). ML approaches actually benefit from the growth of available sequences, while 'brittle' rulebased methods often fail to cope with the growing variability and quantity of possible annotations and sequences.…”

Section: Introductionmentioning

confidence: 99%

ASAP: A Machine-Learning Framework for Local Protein Properties

Ofer

Brandes

Linial

2015

Preprint

Self Cite

View full text Add to dashboard Cite

Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API.Database URL: ASAP's and CleavePred source code, webtool and tutorials are available at:

show abstract

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Cited by 371 publications

References 19 publications

Systems approaches in osteoarthritis: Identifying routes to novel diagnostic and therapeutic strategies

Systems approaches in osteoarthritis: Identifying routes to novel diagnostic and therapeutic strategies

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

ASAP: A Machine-Learning Framework for Local Protein Properties

Contact Info

Product

Resources

About