Maolin Ding scite author profile

Maolin Ding

2Publications

11Citation Statements Received

74Citation Statements Given

How they've been cited

How they cite others

115

Affiliations

Sun Yat-sen University

Publications

Order By: Most citations

Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction

Chen

Zhou

Ding

et al. 2023

Preprint

View full text Add to dashboard Cite

RNA splicing is an important post-transcriptional process of gene expression in eukaryotic organisms. Here, we developed a novel language model, SpliceBERT, pre-trained on the precursor messenger RNA sequences of 72 vertebrates to improve sequence-based modelling of RNA splicing. SpliceBERT is capable of generating embeddings that preserve the evolutionary information of nucleotides and functional characteristics of splice sites. Moreover, the pre-trained model can be utilized to prioritize potential splice-disrupting variants in an unsupervised manner based on genetic variants' impact on the output of SpliceBERT for sequence context. Benchmarked on a multi-species splice site and a human branchpoint prediction task, SpliceBERT outperformed not only conventional baseline models but also other language models pretrained only on the human genome. Our study highlighted the importance of unsupervised learning with genomic sequences of multiple species and indicated that language models were promising approaches to decipher the determinants of RNA splicing.

show abstract

EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

Zhou

Ding

Feng

et al. 2022

View full text Add to dashboard Cite

Long non-coding RNAs (lncRNAs) played essential roles in nearly every biological process and disease. Many algorithms were developed to distinguish lncRNAs from mRNAs in transcriptomic data and facilitated discoveries of more than 600 000 of lncRNAs. However, only a tiny fraction (<1%) of lncRNA transcripts (~4000) were further validated by low-throughput experiments (EVlncRNAs). Given the cost and labor-intensive nature of experimental validations, it is necessary to develop computational tools to prioritize those potentially functional lncRNAs because many lncRNAs from high-throughput sequencing (HTlncRNAs) could be resulted from transcriptional noises. Here, we employed deep learning algorithms to separate EVlncRNAs from HTlncRNAs and mRNAs. For overcoming the challenge of small datasets, we employed a three-layer deep-learning neural network (DNN) with a K-mer feature as the input and a small convolutional neural network (CNN) with one-hot encoding as the input. Three separate models were trained for human (h), mouse (m) and plant (p), respectively. The final concatenated models (EVlncRNA-Dpred (h), EVlncRNA-Dpred (m) and EVlncRNA-Dpred (p)) provided substantial improvement over a previous model based on support-vector-machines (EVlncRNA-pred). For example, EVlncRNA-Dpred (h) achieved 0.896 for the area under receiver-operating characteristic curve, compared with 0.582 given by sequence-based EVlncRNA-pred model. The models developed here should be useful for screening lncRNA transcripts for experimental validations. EVlncRNA-Dpred is available as a web server at https://www.sdklab-biophysics-dzu.net/EVlncRNA-Dpred/index.html, and the data and source code can be freely available along with the web server.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maolin Ding

Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction

EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

Contact Info

Product

Resources

About