2021
DOI: 10.1101/2021.04.27.441365
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Prediction of RNA-protein interactions using a nucleotide language model

Abstract: Motivation:The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequencesand RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult tointerpret and require additional information to sequences. Bidirectional encoder representations from Transformer (BERT)is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture canpotentially overcome such limitat… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 73 publications
(78 reference statements)
0
5
0
Order By: Relevance
“…Wei et al [38] have provided a review of the use of deep learning in RNAprotein interaction prediction. Yamada et al [39] have developed a method to accurately identify RNA sequences that interact with a particular protein by using the DNABERT model [40] that is pre-trained using the human genome. Although our method does not use deep learning, we expect to achieve higher accuracy in prediction by using a pretrained BERT model, which could be improved through the application of deep learning relatively easily.…”
Section: Discussionmentioning
confidence: 99%
“…Wei et al [38] have provided a review of the use of deep learning in RNAprotein interaction prediction. Yamada et al [39] have developed a method to accurately identify RNA sequences that interact with a particular protein by using the DNABERT model [40] that is pre-trained using the human genome. Although our method does not use deep learning, we expect to achieve higher accuracy in prediction by using a pretrained BERT model, which could be improved through the application of deep learning relatively easily.…”
Section: Discussionmentioning
confidence: 99%
“…The i th self-attention head is computed as where Attention h = { a ij } is a scoring matrix, in which a ij denotes the attention weight that the Query token t i gets from then Key token t j . This matrix is widely used for representing and exploring the binding between tokens (33, 49, 56).…”
Section: Methodsmentioning
confidence: 99%
“…Natural language inference BioELMo [93], BlueBERT [179], Sharma et al [212], He et al [78], Zhu et al [302]. Protein and DNA sequence [9], [80], [197], [202], MSA Transformer [198], ProtTrans [54], ProGen [146], DNABERT [90], [275] feasible to make use of unlabelled protein data. In detail, Alphafold2 adopts an auxiliary BERT-like loss to predict pre-masked residues in multiple sequence alignments (MSAs).…”
Section: Question Answeringmentioning
confidence: 99%
“…Yamada and Hamada [275] pre-trains a BERT on RNA sequences and RNA-binding protein sequences. All the LMs remain largely the same as those used for human language data.…”
Section: Question Answeringmentioning
confidence: 99%
See 1 more Smart Citation