2021
DOI: 10.1101/2021.02.04.429837
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FINDER: An automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences

Abstract: Background: Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 176 publications
(200 reference statements)
0
2
0
Order By: Relevance
“…To our knowledge, this is the first time that a comparison has been made for 100+ eukaryote genomes using these three annotation approaches, with previous assessments ranging from 7-12 genomes (Levy Karin et al, 2020;Bruna et al, 2021;Banerjee et al, 2021). This computationally intensive task (18,461 CPU hrs for 48 genomes with length 100-400 Mbp; 8,879 CPU hours for 14 genomes ≥400 Mbp) is achieved in relatively short time-scales through the aggressive use of parallelization and optimization by EukMetaSanity to manage the resources distributed to compute nodes on an HPC system (Figure S2; Supplemental Data 2).…”
Section: Mainmentioning
confidence: 94%
“…To our knowledge, this is the first time that a comparison has been made for 100+ eukaryote genomes using these three annotation approaches, with previous assessments ranging from 7-12 genomes (Levy Karin et al, 2020;Bruna et al, 2021;Banerjee et al, 2021). This computationally intensive task (18,461 CPU hrs for 48 genomes with length 100-400 Mbp; 8,879 CPU hours for 14 genomes ≥400 Mbp) is achieved in relatively short time-scales through the aggressive use of parallelization and optimization by EukMetaSanity to manage the resources distributed to compute nodes on an HPC system (Figure S2; Supplemental Data 2).…”
Section: Mainmentioning
confidence: 94%
“…When heterogeneous extrinsic evidence sources are available, some genome annotation tools like MAKER2 [20] and GeMoMa [21] integrate these different sources directly into the annotation protocol. Some, like the recent FINDER [22], perform protein-spliced alignments only with proteins that are mapped to genes missed by RNA-seq-based methods. On the other hand, FINDER does not use RNA-seq evidence to assess or compare homology-based gene models.…”
Section: Introductionmentioning
confidence: 99%