2020
DOI: 10.1093/bioinformatics/btaa1044
|View full text |Cite
|
Sign up to set email alerts
|

Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning

Abstract: Motivation Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely unchanged for about two decades, limiting learning capabilities. Here, we set out to improve the cross-species annotation of genes from DNA sequence alone with the help of deep learning. The goal is t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 41 publications
(37 citation statements)
references
References 31 publications
(25 reference statements)
0
37
0
Order By: Relevance
“…Since SS and other regulatory motifs may be conserved across similar species [ 20 ], some work has been done to try to transfer models trained on model organisms to related organisms, for example between different vertebrate genomes [ 66 ]. Others have built cross-species models for specific clades, such as animals or plants using Helixer [ 67 ], but unfortunately the source code for this program is not yet stable (according to the authors). The aim of our work was to extend the idea of cross-species models to conceive universal SS prediction models (one for each SS), that are applicable to a wider range of organisms.…”
Section: Discussionmentioning
confidence: 99%
“…Since SS and other regulatory motifs may be conserved across similar species [ 20 ], some work has been done to try to transfer models trained on model organisms to related organisms, for example between different vertebrate genomes [ 66 ]. Others have built cross-species models for specific clades, such as animals or plants using Helixer [ 67 ], but unfortunately the source code for this program is not yet stable (according to the authors). The aim of our work was to extend the idea of cross-species models to conceive universal SS prediction models (one for each SS), that are applicable to a wider range of organisms.…”
Section: Discussionmentioning
confidence: 99%
“…Consistent structural gene annotations were generated for each species with Helixer (Stiehler et al 2020) using the hybrid convolutional and bidirectional long-short term memory model, , specifically the trained instance of . This was followed by post-processing the raw predictions into final primary gene models with Helixer Post (Bolger 2022, personal communication).…”
Section: Methodsmentioning
confidence: 99%
“…Transition weights The base-wise probabilities output by the original version of Helixer [Stiehler et al, 2020] were scored against the references with the categorical cross-entropy loss function, and the penalty for each base pair varied at most by the setting of class weights (to compensate for class imbalance). This setup does not fully reflect the biological significance of different mistakes made by the network; and modifications to the Helixer architecture presented here address this.…”
Section: Improved Reflection Of Biological Importance Of Predictive T...mentioning
confidence: 99%
“…For peceptibility, differences between the genic F1 of models are displayed as a Z-score (middle). Models marked with * used the same training species as inStiehler et al [2020]. Grey indicates data is not available.…”
mentioning
confidence: 99%