2019
DOI: 10.1101/690271
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural network and random forest models in protein function prediction

Abstract: Over the past decade, the demand for automated protein function prediction has increased due to the volume of newly sequenced proteins. In this paper, we address the function prediction task by developing an ensemble system automatically assigning Gene Ontology (GO) terms to the given input protein sequence.We develop an ensemble system which combines the GO predictions made by random forest (RF) and neural network (NN) classifiers. Both RF and NN models rely on features derived from BLAST sequence alignments,… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 52 publications
0
10
0
Order By: Relevance
“…Here the aim is to allow a classifier to generate different predictions with same sequence feature when the sequence occurs in different regions of species taxonomy tree. We were the first research group to use taxonomy in our AFP method [14], and it has been since used, to our knowledge, by only one other research group [11]. Here we used a script (from [22]) that takes the species taxonomy identifier, maps it to NCBI taxonomy hierarchy and links species to its taxonomic groups.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Here the aim is to allow a classifier to generate different predictions with same sequence feature when the sequence occurs in different regions of species taxonomy tree. We were the first research group to use taxonomy in our AFP method [14], and it has been since used, to our knowledge, by only one other research group [11]. Here we used a script (from [22]) that takes the species taxonomy identifier, maps it to NCBI taxonomy hierarchy and links species to its taxonomic groups.…”
Section: Methodsmentioning
confidence: 99%
“…Combining AFP predictions from different classifiers is often used to increase prediction performance. This is usually done by just pooling all the predictions [8, 12], or using a weighting scheme in pooling for increased performance [18]. Usually, the pooling is optimized over all classes simultaneously, i.e., the same classifier weight distribution is used in pooling for all classes.…”
Section: Supplementary Textmentioning
confidence: 99%
See 1 more Smart Citation
“…In recent years, some organizations and teams have developed algorithms, tools, and systems for protein function prediction using advanced computer technologies, such as machine learning and deep neural networks (Kulmanov et al, 2018;You et al, 2018You et al, , 2019Hakala et al, 2019;Lv et al, 2019b;Piovesan and Tosatto, 2019;Rifaioglu et al, 2019;Kulmanov and Hoehndorf, 2020). Researchers predict protein functions from one or more of the followings: protein sequences (Kulmanov et al, 2018;You et al, 2018You et al, , 2019Hakala et al, 2019;Piovesan and Tosatto, 2019;Kulmanov and Hoehndorf, 2020), protein structures (Yang et al, 2015;Zhang et al, 2018), protein protein interactions (PPI) network (Kulmanov et al, 2018;Zhang et al, 2018;You et al, 2019), and others (Kahanda and Ben-Hur, 2017;Hakala et al, 2019;Piovesan and Tosatto, 2019;Rifaioglu et al, 2019). For example specifically, GOLabeler (You et al, 2018) integrated five different types of sequence-based information and learned from the idea of web page ranking to train an LTR (learning to rank) regression model to receive these five types of information to achieve accurate annotation of GO terms.…”
Section: Introductionmentioning
confidence: 99%
“…Compared with GOLabler, it has achieved a significant improvement in protein function prediction performance. Hakala et al (2019) developed an integrated system, which obtain features from several different tools or methods: BLASTP, InterproScan, NCBI Taxonomy, NucPred, NetAcet, PredGPI, and Amino Acid Index (Kawashima and Kanehisa, 2000;Heddad et al, 2004;Kiemer et al, 2005;Pierleoni et al, 2008;Camacho et al, 2009;Federhen, 2012;Jones et al, 2014), and then respectively feed all the features to two classifiers based on neural network and random forest and finally combined the NN classifier and the RF classifier to achieve the best prediction performance. DeepGO (Kulmanov et al, 2018) encodes the amino acid sequence of the protein by trigrams and maps the trigrams to vector by one-hot encoding and dense embedding, and then feed it to a convolutional neural network (CNN) to extract the feature map.…”
Section: Introductionmentioning
confidence: 99%