2021
DOI: 10.1101/2021.07.06.451243
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Predicting RNA splicing from DNA sequence using Pangolin

Abstract: Recent progress in deep learning approaches have greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues that has been trained on RNA splicing and sequence data from four species. Pangolin outperforms state of the art methods for predicting RNA splicing on a variety of prediction tasks. We use Pangolin to study the impact of genetic variants on RNA splicing, including lineage-specific variants and ra… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 34 publications
0
7
0
Order By: Relevance
“…For instance, end-to-end prediction is generally recommended to leverage the full power of Deep Learning. Others have shown that Deep Learning can effectively directly predict transitions such as splice sites [Zeng and Li, 2022, Zhang et al, 2016, Wang et al, 2019. It is conceptually possible to encode a full gene structure as a series of transition tokens and positions, and then predict structure with a many-to-many model architecture such as those used for large language modeling [Brown et al, 2020, Vaswani et al, 2017.…”
Section: Discussionmentioning
confidence: 99%
“…For instance, end-to-end prediction is generally recommended to leverage the full power of Deep Learning. Others have shown that Deep Learning can effectively directly predict transitions such as splice sites [Zeng and Li, 2022, Zhang et al, 2016, Wang et al, 2019. It is conceptually possible to encode a full gene structure as a series of transition tokens and positions, and then predict structure with a many-to-many model architecture such as those used for large language modeling [Brown et al, 2020, Vaswani et al, 2017.…”
Section: Discussionmentioning
confidence: 99%
“…Currently, we only applied SpliceBERT to predict splice sites and branchpoints independent of tissue or cell types, while tissue/cell typespecific alternative splicing cannot be ignored because they contributed a lot to transcriptomic and proteomic diversity (43). In fact, predicting tissue/cell type-specific splicing is still challenging solely from sequences (10,44). We may combine additional tasks like tissuespecific splice site strength (10) and RNA binding protein binding sites (45) to improve the model's ability for tissue-specific prediction.…”
Section: Discussionmentioning
confidence: 99%
“…In fact, predicting tissue/cell type-specific splicing is still challenging solely from sequences (10,44). We may combine additional tasks like tissuespecific splice site strength (10) and RNA binding protein binding sites (45) to improve the model's ability for tissue-specific prediction.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Splicing patterns are directly influenced by the nucleotide sequences surrounding splice sites [29]. This has enabled the development of many algorithms to predict alternative splicing from RNA-seq [30][31][32] or DNA sequence [33][34][35][36][37]. Beyond human clinical applications, methods that require only DNA sequence can be leveraged to understand alternative splicing in extant species for which acquiring RNA-seq data may be difficult to impossible or in extinct taxa, such as archaic hominins.…”
Section: Introductionmentioning
confidence: 99%