1998
DOI: 10.1089/cmb.1998.5.681
|View full text |Cite
|
Sign up to set email alerts
|

Assembling Genes from Predicted Exons in Linear Time with Dynamic Programming

Abstract: In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
56
0

Year Published

2000
2000
2015
2015

Publication Types

Select...
6
3

Relationship

4
5

Authors

Journals

citations
Cited by 99 publications
(56 citation statements)
references
References 23 publications
0
56
0
Order By: Relevance
“…The genome was assembled by using Phrap. Gene models were constructed by combining sequencing information from 1,536 EST sequences and in silico predictions from TWIN-SCAN (31), GeneID (32), and GeneWise (33) (SI Text). Putative gene function assignments were obtained through homology searches against the National Center for Biotechnology Information nr, PFAM (34), and TIGRFAM (35) databases.…”
Section: Discussionmentioning
confidence: 99%
“…The genome was assembled by using Phrap. Gene models were constructed by combining sequencing information from 1,536 EST sequences and in silico predictions from TWIN-SCAN (31), GeneID (32), and GeneWise (33) (SI Text). Putative gene function assignments were obtained through homology searches against the National Center for Biotechnology Information nr, PFAM (34), and TIGRFAM (35) databases.…”
Section: Discussionmentioning
confidence: 99%
“…GeneWise 56 was used to predict the exact gene structure of the corresponding genomic regions on each BLAST hit. Five ab initio gene prediction programs, Augustus (version 2.5.5) 57 , Genscan (version 1.0) 58 , GlimmerHMM (version 3.0.1) 59 , Geneid 60 , and SNAP 61 , were used to predict coding regions in the repeat-masked genome. Finally, RNA-seq data were mapped to the assembly using Tophat (version npg PEG, heat and cold.…”
Section: Methodsmentioning
confidence: 99%
“…We tested two additional models of coding DNA before deciding for a Markov model of order 5, a Codon usage model, and a model that combined a Markov model of order 1 of the translated amino acid sequence and a Codon preference model (see Guigó 1999 for details on these models). In both cases, log-likelihood ratios were obtained in a similar way to the Markov model loglikelihood ratios (see Methods).…”
Section: Training Geneidmentioning
confidence: 99%
“…This new version maintains the hierarchical structure (signal to exon to gene) in the original GeneID, but we have simplified the scoring schema and furnished it with a probabilistic meaning: Scores for both exon-defining signals and protein-coding potential are computed as loglikelihood ratios, which for a given predicted exon are summed up into the exon score, in consequence also a log-likelihood ratio. Then, a dynamic programming algorithm (Guigó 1998) is used to search the space of predicted exons to assemble the gene structure (in the general case, multiple genes in both strands) maximizing the sum of the scores of the assembled exons, which can also be assumed to be a log-likelihood ratio. Execution time in this new version of GeneID grows linearly with the size of the input sequence, currently at ∼2 Mb per minute in a Pentium III (500 MHz) running linux.…”
mentioning
confidence: 99%