2009
DOI: 10.1101/gr.090597.108
|View full text |Cite
|
Sign up to set email alerts
|

mGene: Accurate SVM-based gene finding with an application to nematode genomes

Abstract: We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibite… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
59
0
1

Year Published

2010
2010
2020
2020

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 87 publications
(60 citation statements)
references
References 44 publications
0
59
0
1
Order By: Relevance
“…This approach allows us to extract information from various feature combinations in order to identify patterns within a known learning set (bait) and perform subsequent predictions on an unknown data set (prey). Machine learning algorithms have been used for biological data mining, including applications for prediction of protein targeting signals (see reference 70 for a review) or protein-protein interactions (37), and finding protein-encoding genes (72) and noncoding RNAs (51) within completely sequenced genomes. Using this approach we predicted and subsequently validated experimentally new hydrogenosomal proteins, some of which do not carry N-terminal targeting motifs.…”
mentioning
confidence: 99%
“…This approach allows us to extract information from various feature combinations in order to identify patterns within a known learning set (bait) and perform subsequent predictions on an unknown data set (prey). Machine learning algorithms have been used for biological data mining, including applications for prediction of protein targeting signals (see reference 70 for a review) or protein-protein interactions (37), and finding protein-encoding genes (72) and noncoding RNAs (51) within completely sequenced genomes. Using this approach we predicted and subsequently validated experimentally new hydrogenosomal proteins, some of which do not carry N-terminal targeting motifs.…”
mentioning
confidence: 99%
“…A SVM-based two-layer approach [94] consists of independent SVM signal and content detectors, and hidden semi-Markov (HSM) SVMs. The first layer is SVM feature recognition, while the second layer is gene structure reconstruction.…”
Section: Support Vector Machines and Kernel Methodsmentioning
confidence: 99%
“…Machine learning (ML) methods, particularly SVM-based methods, have been a significant methodology in genome analysis for solving a wide range of problems, including gene finding [94,198]. Recently, machine learning based methods [169,199] are being further used for gene prediction in metagenomic fragments.…”
Section: ) Machine Learningmentioning
confidence: 99%
“…In order to improve the classification accuracy, many approaches have been proposed from the perspective of machine learning and pattern recognition Guyon et al, 2002;Lee and Zhang, 2006;Nevins and Potti, 2007;Liu and Huang, 2008;Schweikert et al, 2009;Zheng et al, 2009;Cai et al, 2010;Leung and Hung, 2010;Zare et al, 2011;Zheng et al, 2011;Wang et al, 2012). Despite of the success achieved by these advanced techniques, the improvement for the classification accuracy remains limited, because they only deal with the data obtained from the biological experiments, which contains noise and missing values.…”
Section: Introductionmentioning
confidence: 99%