2004
DOI: 10.1186/1471-2105-5-206
|View full text |Cite|
|
Sign up to set email alerts
|

An empirical analysis of training protocols for probabilistic gene finders

Abstract: Background: Generalized hidden Markov models (GHMMs) appear to be approaching acceptance as a de facto standard for state-of-the-art ab initio gene finding, as evidenced by the recent proliferation of GHMM implementations. While prevailing methods for modeling and parsing genes using GHMMs have been described in the literature, little attention has been paid as of yet to their proper training. The few hints available in the literature together with anecdotal observations suggest that most practitioners perform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2007
2007
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(6 citation statements)
references
References 28 publications
(25 reference statements)
0
6
0
Order By: Relevance
“…AUGUSTUS for maize is used for the Poaceae family. For GeneZilla, we employed an automatic training program, GRAPE ( Majoros and Salzberg 2004 ). To train the other programs, we followed the instructions provided by the software developer.…”
Section: Methodsmentioning
confidence: 99%
“…AUGUSTUS for maize is used for the Poaceae family. For GeneZilla, we employed an automatic training program, GRAPE ( Majoros and Salzberg 2004 ). To train the other programs, we followed the instructions provided by the software developer.…”
Section: Methodsmentioning
confidence: 99%
“…We also used three methods to annotate genes in the genome, including the de novo prediction, homology-based prediction, and transcriptome-based prediction were combined using EvidenceModeler (EVM) v1.1.1 (Haas et al, 2008) and PASA v2.4.0 (Haas et al, 2008). For de novo gene predictions, we used Augustus v3.4.0 (Stanke et al, 2008), Genscan v3.1 (Burge and Karlin, 1998), and GlimmerHMM v3.0.1 (Majoros and Salzberg, 2004) to analyze the repeat-masked genome. For homology-based predictions, the protein sequences of Cynoglossus semilaevis, Danio rerio, Gasterosteus aculeatus, Gadus morhua, Larimichthys crocea, Oryzias latipes, Oreochromis niloticus, T. rubripes, T. bimaculatus and Tetraodon nigroviridis obtained from NCBI were aligned to the yellow boxfish genome by following the pipeline of Chen et al (Chen et al, 2019).…”
Section: Genome Annotationmentioning
confidence: 99%
“…For RNA-seq data, we first used HISAT v2.1.0 (Kim et al, 2015) with default parameters to align RNA-seq data to V. variegatus genome and then used StringTie v2.0 (Pertea et al, 2015) with default parameters to reconstruct transcripts. After using RepeatMasker to mask TEs of the assembled genome, five de novo gene predictors, including Augustus (Stanke et al, 2008), GlimmerHMM (Majoros and Salzberg, 2004), SNAP (Korf, 2004), Geneid (Alioto et al, 2018) and Genscan (Burge and Karlin, 1998), were used for gene prediction. For the homologybased prediction, proteins sequences of Homo sapiens, Danio rerio, Oryzias latipes, Takifugu rubripes, Cynoglossus semilaevis, Scophthalmus maximus and Gasterosteus aculeatus were downloaded from Ensembl (release 98), Paralichthys olivaceus proteins were downloaded from NCBI, then we used Exonerate v2.2 (Slater and Birney, 2005) (identity>80%) to map the proteins sequences to V. variegatus genome for conduct homology-based gene prediction.…”
Section: Gene Structure and Functional Annotationmentioning
confidence: 99%