2022
DOI: 10.1101/2022.09.08.507143
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Integrating gene annotation with orthology inference at scale

Abstract: Annotating coding genes and inferring orthologs are two classical challenges in genomics and evolutionary biology that have traditionally been approached separately, which limits scalability. We present TOGA, the first method that integrates gene annotation and orthology inference. TOGA implements a novel paradigm to infer orthologous genes, improves ortholog detection and annotation completeness compared to state-of-the-art methods, and handles even highly-fragmented assemblies. TOGA scales to hundreds of gen… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(53 citation statements)
references
References 88 publications
0
53
0
Order By: Relevance
“…We sought to evaluate the presence of convergent genomic specializations between vocal learning mammals using new datasets and computational approaches, focusing on bats as an attractive mammalian model of vocal complexity (6)(7)(8)(9)(10)(11)(12)(13). Specifically, we used protein-coding sequences from genomes generated by the Zoonomia Consortium (14,15) and models of evolutionary rate convergence (16)to identify genes repeatedly associated with the evolution of vocal learning. Motivated by the finding of protein-level convergence, we next profiled open chromatin specializations of multiple brain regions and somatic tissues in the Egyptian fruit bat, a bat with robust vocal plasticity (10)(11)(12), to identify vocalization-associated epigenomic specializations.…”
Section: Main Textmentioning
confidence: 99%
See 1 more Smart Citation
“…We sought to evaluate the presence of convergent genomic specializations between vocal learning mammals using new datasets and computational approaches, focusing on bats as an attractive mammalian model of vocal complexity (6)(7)(8)(9)(10)(11)(12)(13). Specifically, we used protein-coding sequences from genomes generated by the Zoonomia Consortium (14,15) and models of evolutionary rate convergence (16)to identify genes repeatedly associated with the evolution of vocal learning. Motivated by the finding of protein-level convergence, we next profiled open chromatin specializations of multiple brain regions and somatic tissues in the Egyptian fruit bat, a bat with robust vocal plasticity (10)(11)(12), to identify vocalization-associated epigenomic specializations.…”
Section: Main Textmentioning
confidence: 99%
“…To explore the possibility of shared genomic specializations associated with vocal learning, we first used new protein-coding alignments for hundreds of mammals ( 14 ) to identify genes whose rates of evolution differs between vocal learners and other mammals, and which may thus be under evolutionary selection related to vocal learning ( 16 ). We analyzed 16,209 high-quality gene alignments across 175 boreoeutherian mammals, including 25 vocal learning species (Materials and Methods).…”
Section: Main Textmentioning
confidence: 99%
“…We used TOGA (Tool to infer Orthologs from Genome Alignments) [34] with human and mouse as references to annotate genes in the Nile rat genome. In addition to providing gene annotations, TOGA distinguishes between intact genes and genes with missing sequences or inactivating mutations, which can be used to evaluate the quality of genome assemblies.…”
Section: Nile Rat Assembly Is Highly Complete Contiguous and Accuratementioning
confidence: 99%
“…We used TOGA [34] to project protein coding genes from human and house mouse to the Nile rat genome. Overall, 99.7% of TOGA annotated genes in the paternal assembly are also annotated in the maternal assembly using mouse gene models; when using human gene models the number is 96.6%.…”
Section: Differences In Protein Coding Gene Content Between Nile Rat ...mentioning
confidence: 99%
“…We applied TOGA using the human GRChg38 assembly and the human GENCODE V38 gene annotation as reference to genomes of cetaceans (query species) (91). Briefly, TOGA uses pairwise genome alignment chains and machine learning to infer orthologous loci for each transcript in the reference annotation, utilizing that orthologous genes exhibit alignments in intronic and flanking intergenic regions.…”
Section: Toga Applicationmentioning
confidence: 99%