2020
DOI: 10.1111/jpy.12947
|View full text |Cite
|
Sign up to set email alerts
|

Evidence That Inconsistent Gene Prediction Can Mislead Analysis of Dinoflagellate Genomes

Abstract: Comparative algal genomics often relies on predicted genes from de novo assembled genomes. However, the artifacts introduced by different gene-prediction approaches, and their impact on comparative genomic analysis remain poorly understood. Here, using available genome data from six dinoflagellate species in the Symbiodiniaceae, we identified methodological biases in the published genes that were predicted using different approaches and putative contaminant sequences in the published genome assemblies. We deve… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
64
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 47 publications
(66 citation statements)
references
References 13 publications
0
64
2
Order By: Relevance
“…As expected, the number of predicted genes in all six (haploid) genomes of Symbiodiniaceae is roughly similar to the rough estimate of haploid gene number for the two P. glacialis genomes ( Table 2 and Additional file 3: Supplementary Table 12). The proportion of genes predicted in P. glacialis that are supported by transcriptome evidence (~94% for each isolate; Table 2) is much higher than in the Symbiodiniaceae isolates (~79% averaged among six genomes [39]). This result may be explained by the more extensive transcriptome data we generated in this study (using both RNA-Seq short-read and Iso-Seq full-length transcripts) to guide our gene prediction workflow (see the 'Methods' section), compared to the transcriptome data (based on RNA-Seq short-reads) available for the other isolates.…”
Section: Dinosl In Full-length Transcripts Of Polarella Glacialismentioning
confidence: 93%
See 4 more Smart Citations
“…As expected, the number of predicted genes in all six (haploid) genomes of Symbiodiniaceae is roughly similar to the rough estimate of haploid gene number for the two P. glacialis genomes ( Table 2 and Additional file 3: Supplementary Table 12). The proportion of genes predicted in P. glacialis that are supported by transcriptome evidence (~94% for each isolate; Table 2) is much higher than in the Symbiodiniaceae isolates (~79% averaged among six genomes [39]). This result may be explained by the more extensive transcriptome data we generated in this study (using both RNA-Seq short-read and Iso-Seq full-length transcripts) to guide our gene prediction workflow (see the 'Methods' section), compared to the transcriptome data (based on RNA-Seq short-reads) available for the other isolates.…”
Section: Dinosl In Full-length Transcripts Of Polarella Glacialismentioning
confidence: 93%
“…Prediction of protein-coding genes in Polarella glacialis is likely impacted by RNA editing Using a gene-prediction workflow customised for dinoflagellate genomes [39] (see the 'Methods' section), we predicted 58,232 and 51,713 protein-coding genes (hereinafter genes) in the CCMP1383 and CCMP2088 genomes, respectively ( Table 2 and Additional file 3: Supplementary Table 12). Of the 58,232 genes predicted in CCMP1383, 51,640 (88.68%) of the encoded proteins were recovered in CCMP2088 ( Fig.…”
Section: Dinosl In Full-length Transcripts Of Polarella Glacialismentioning
confidence: 99%
See 3 more Smart Citations