2019
DOI: 10.1101/660563
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cross-species regulatory sequence activity prediction

Abstract: Machine learning algorithms trained to predict the regulatory activity of nucleic acid sequences have revealed principles of gene regulation and guided genetic variation analysis. While the human genome has been extensively annotated and studied, model organisms have been less explored. Model organism genomes offer both additional training sequences and unique annotations describing tissue and cell states unavailable in humans. Here, we develop a strategy to train deep convolutional neural networks simultaneou… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
36
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(38 citation statements)
references
References 42 publications
2
36
0
Order By: Relevance
“…The value of deep learning models for prioritizing rare pathogenic variants has been questioned in a recent analysis focusing on Human Gene Mutation Database (HGMD) variants 40 , meriting further investigation. Second, our analyses of allelic-effect annotations are restricted to unsigned analyses, but signed analyses have also proven valuable in linking deep learning annotations to molecular traits and complex disease 16,41,42 ; however, genome-wide signed relationships are unlikely to hold for the regulatory marks (DNase and histone marks) that we focus on here, which do not correspond to specific genes or pathways. Third, we focused here on deep learning models trained to predict specific regulatory marks, but deep learning models have also been used to predict a broader set of regulatory features, including gene expression levels and cryptic splicing 15,16,39 , that may be informative for complex disease.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The value of deep learning models for prioritizing rare pathogenic variants has been questioned in a recent analysis focusing on Human Gene Mutation Database (HGMD) variants 40 , meriting further investigation. Second, our analyses of allelic-effect annotations are restricted to unsigned analyses, but signed analyses have also proven valuable in linking deep learning annotations to molecular traits and complex disease 16,41,42 ; however, genome-wide signed relationships are unlikely to hold for the regulatory marks (DNase and histone marks) that we focus on here, which do not correspond to specific genes or pathways. Third, we focused here on deep learning models trained to predict specific regulatory marks, but deep learning models have also been used to predict a broader set of regulatory features, including gene expression levels and cryptic splicing 15,16,39 , that may be informative for complex disease.…”
Section: Discussionmentioning
confidence: 99%
“…Third, we focused here on deep learning models trained to predict specific regulatory marks, but deep learning models have also been used to predict a broader set of regulatory features, including gene expression levels and cryptic splicing 15,16,39 , that may be informative for complex disease. We have also not considered the application of deep learning models to TFBS, CAGE and ATAC-seq data 16,42 , which is a promising future research direction. Fourth, we focused here on deep learning models trained using human data, but models trained using data from other species may also be informative for human disease 43,42 .…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, we applied a 'head' that transforms the 1D representations to 2D for Hi-C prediction. We implemented the model using the Basenji software 16,17 , which is written in Tensorflow 40 and Keras 41 .…”
Section: Model Architecturementioning
confidence: 99%
“…The Akita architecture consists of a 'trunk' based on the Basenji 16,17 architecture to obtain 1D representations of genomic sequence, followed by a 'head' to transform to 2D maps of genome folding (Fig. 1a, Methods).…”
mentioning
confidence: 99%