Automatic Gene Function Prediction in the 2020’s

Makrodimitris, Stavros; Ham, Roeland C. H. J. van; Reinders, Marcel J. T.

doi:10.3390/genes11111264

Cited by 28 publications

(26 citation statements)

References 92 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Selecting evaluation metrics for AFP is a difficult and often overlooked task [18, 15]. Still, it has a drastic impact on results, and some popular evaluation metrics are not well suited for AFP task [18, 3, 8].…”

Section: Methodsmentioning

confidence: 99%

Optimizing InterProScan representation generates a surprisingly good protein function prediction method

Tiittanen

Törönen

2022

Preprint

View full text Add to dashboard Cite

Motivation: Automated protein Function Prediction (AFP) is an intensively studied topic. Most of this research focuses on methods that combine multiple data sources, while fewer articles look for the most efficient ways to use a single data source. Therefore, we wanted to test how different preprocessing methods and classifiers would perform in the AFP task when we process the output from the InterProscan (IPS). Especially, we present novel preprocessing methods, less used classifiers and inclusion of species taxonomy. We also test classifier stacking for combining tested classifier results. Methods are tested with in-house data and CAFA3 competition evaluation data. Results: We show that including IPS localisation and taxonomy to the data improves results. Also the stacking improves the performance. Surprisingly, our best performing methods outperformed all international CAFA3 competition participants in most tests. Altogether, the results show how preprocessing and classifier combinations are beneficial in the AFP task. Contact: petri.toronen(AT)helsinki.fi Supplementary information: Supplementary text is available at the project web site http://ekhidna2.biocenter.helsinki.fi/AFP

show abstract

Section: Methodsmentioning

confidence: 99%

Optimizing InterProScan representation generates a surprisingly good protein function prediction method

Tiittanen

Törönen

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Other studies address the node classification problem and obtain state-of-the-art performance for different case studies (see, e.g., Abu-El-Haija et al 2019;Chen et al 2021;Hamilton et al 2017;Kipf and Welling 2017;Makrodimitris et al 2020;Xiao et al 2021). However, they do not take into account dependencies between classes (hierarchical or not), for they focus on multi-class instead of multi-label problems.…”

Section: Related Workmentioning

confidence: 99%

A top-down supervised learning approach to hierarchical multi-label classification in networks

2022

View full text Add to dashboard Cite

Node classification is the task of inferring or predicting missing node attributes from information available for other nodes in a network. This paper presents a general prediction model to hierarchical multi-label classification, where the attributes to be inferred can be specified as a strict poset. It is based on a top-down classification approach that addresses hierarchical multi-label classification with supervised learning by building a local classifier per class. The proposed model is showcased with a case study on the prediction of gene functions for Oryza sativa Japonica, a variety of rice. It is compared to the Hierarchical Binomial-Neighborhood, a probabilistic model, by evaluating both approaches in terms of prediction performance and computational cost. The results in this work support the working hypothesis that the proposed model can achieve good levels of prediction efficiency, while scaling up in relation to the state of the art.

show abstract

“…The hierarchical structure of GO also complicates the evaluation of AFP methods. There is ongoing debate how to properly evaluate AFP models 18 . One of the biggest issues with the evaluation metrics is that one can get very good results, with some evaluation metrics, by simply reporting the GO classes in decreasing order of their frequency in the database, for every tested gene 19,20 .…”

Section: Introductionmentioning

confidence: 99%

PANNZER—A practical tool for protein function prediction

Törönen

2021

Protein Science

View full text Add to dashboard Cite

The facility of next‐generation sequencing has led to an explosion of gene catalogs for novel genomes, transcriptomes and metagenomes, which are functionally uncharacterized. Computational inference has emerged as a necessary substitute for first‐hand experimental evidence. PANNZER (Protein ANNotation with Z‐scoRE) is a high‐throughput functional annotation web server that stands out among similar publically accessible web servers in supporting submission of up to 100,000 protein sequences at once and providing both Gene Ontology (GO) annotations and free text description predictions. Here, we demonstrate the use of PANNZER and discuss future plans and challenges. We present two case studies to illustrate problems related to data quality and method evaluation. Some commonly used evaluation metrics and evaluation datasets promote methods that favor unspecific and broad functional classes over more informative and specific classes. We argue that this can bias the development of automated function prediction methods. The PANNZER web server and source code are available at http://ekhidna2.biocenter.helsinki.fi/sanspanz/.

show abstract

Automatic Gene Function Prediction in the 2020’s

Cited by 28 publications

References 92 publications

Optimizing InterProScan representation generates a surprisingly good protein function prediction method

Optimizing InterProScan representation generates a surprisingly good protein function prediction method

A top-down supervised learning approach to hierarchical multi-label classification in networks

PANNZER—A practical tool for protein function prediction

Contact Info

Product

Resources

About