2014
DOI: 10.1186/1471-2105-15-189
|View full text |Cite
|
Sign up to set email alerts
|

Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment

Abstract: BackgroundAccurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods.ResultsWe present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genome… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 49 publications
(25 citation statements)
references
References 51 publications
0
25
0
Order By: Relevance
“…Importantly, the sole reliance on gene models would have resulted in the homologous relationships for over 30 % of the genes going unfound. Central to this is the new crop of protein-to-genome aligners that are refining long established genome annotations [ 33 , 66 ].…”
Section: Discussionmentioning
confidence: 99%
“…Importantly, the sole reliance on gene models would have resulted in the homologous relationships for over 30 % of the genes going unfound. Central to this is the new crop of protein-to-genome aligners that are refining long established genome annotations [ 33 , 66 ].…”
Section: Discussionmentioning
confidence: 99%
“…Recently, we have access to huge amounts of sequence data from widely divergent organisms, but the quality of the data is not always high because of the limitations of sequencing technologies. In the case of amino acid sequence data, the difficulty in eukaryotic gene prediction [28–30] also results in errors in data. It might be possible to automatically exclude such problematic data in certain cases, but sometimes, biologically important information is in low-quality sequences, especially when interest is in nonmodel organisms.…”
Section: Interactive Sequence Choice and Visualizationmentioning
confidence: 99%
“…Unfortunately, the quality of the sequences is not always high, partly due to limitations in sequencing technologies. Moreover, at the amino acid sequence level, a number of errors can be introduced due to difficulty in gene prediction ( Brent, 2005 ; Gotoh et al , 2014 ; Nagy and Patthy, 2013 ; Yandell and Ence, 2012 ). With incorrect reading frames, unrelated amino acid segments can appear in a set of homologous sequences.…”
Section: Introductionmentioning
confidence: 99%