2022
DOI: 10.1101/2022.10.03.510643
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Welcome to the big leaves: best practices for improving genome annotation in non-model plant genomes

Abstract: Premise of the study: Robust standards to evaluate quality and completeness are lacking for eukaryotic structural genome annotation. Genome annotation software is developed with model organisms and does not typically include benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. Plant genomes are particularly challenging with their large genome sizes, abundant transposable elements (TEs), and variable ploidies. This study investigates the impact of genome quality, complexit… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 76 publications
1
10
0
Order By: Relevance
“…We also obtained a mono:multi‐exonic gene ratio of 0.24 and an annotation rate of 82–84% based on reciprocal BLAST against NCBI RefSeq Plant or UniProt databases for both haplotypes (Table S1). These results met the recommended metrics for high‐quality genome annotation best practices (Vuruputoor et al., 2023).…”
Section: Resultssupporting
confidence: 58%
See 1 more Smart Citation
“…We also obtained a mono:multi‐exonic gene ratio of 0.24 and an annotation rate of 82–84% based on reciprocal BLAST against NCBI RefSeq Plant or UniProt databases for both haplotypes (Table S1). These results met the recommended metrics for high‐quality genome annotation best practices (Vuruputoor et al., 2023).…”
Section: Resultssupporting
confidence: 58%
“…The gene number discrepancies between Populus genomes are attributable in part to lineage‐specific genes (Vuruputoor et al., 2023; Yates et al., 2021) and to copy number variation of gene family members, especially tandem gene duplicates (Żmieńko et al., 2014). Examples of tandem gene copy number variation between the two 717 haplotypes have recently been reported and include both structural and regulatory genes (Bewg et al., 2022; Chen et al., 2023).…”
Section: Resultsmentioning
confidence: 99%
“…Both GALBA and BRAKER2 tend to heavily overpredict single-exon genes, most likely a result of incorrectly splitting genes. For plants, a desired mono-to multi-exonic gene ratio of 0.2 was recently postulated by [44]. This particular ratio certainly does not hold for non-plant species, and also the reference annotations of plants used in this manuscript often deviated from that recommendation.…”
Section: Discussionmentioning
confidence: 91%
“…For Coix aquatica, we used the poales odb10. Further, we report basic metrics such as the number of predicted genes, the number of transcripts, the recently suggested mono-exonic to multi-exonic gene ratio [44], and the maximum number of exons per gene across all predicted genes.…”
Section: Prediction Quality Estimationmentioning
confidence: 99%
“…Despite advances in genome assembly methods, genome annotation remains one of the most challenging bottlenecks facing plant genome science, with intron length variation, divergent TE dynamics, and low sequence conservation hampering the annotation efforts of non‐model genome projects. In their contribution, Vuruptoor et al (2023) address the need to improve quantification of structural genome annotation methods, employing a mixture of existing and emerging metrics to benchmark genome annotation methods. They approach the issue in a robust manner by using a broad diversity of taxa with challenging genomic features such as variable ploidy, high TE content, and large genomes.…”
mentioning
confidence: 99%