Shengli Zhang scite author profile

RNA 5-hydroxymethylcytosine (5hmC) is a kind of RNA modification, which is related to the life activities of many organisms. Studying its distribution is very important to reveal its biological function. Previously, high-throughput sequencing was used to identify 5hmC, but it is expensive and inefficient. Therefore, machine learning is used to identify 5hmC sites. Here, we design a model called R5hmCFDV, which is mainly divided into feature representation, feature fusion and classification. (i) Pseudo dinucleotide composition, dinucleotide binary profile and frequency, natural vector and physicochemical property are used to extract features from four aspects: nucleotide composition, coding, natural language and physical and chemical properties. (ii) To strengthen the relevance of features, we construct a novel feature fusion method. Firstly, the attention mechanism is employed to process four single features, stitch them together and feed them to the convolution layer. After that, the output data are processed by BiGRU and BiLSTM, respectively. Finally, the features of these two parts are fused by the multiply function. (iii) We design the deep voting algorithm for classification by imitating the soft voting mechanism in the Python package. The base classifiers contain deep neural network (DNN), convolutional neural network (CNN) and improved gated recurrent unit (GRU). And then using the principle of soft voting, the corresponding weights are assigned to the predicted probabilities of the three classifiers. The predicted probability values are multiplied by the corresponding weights and then summed to obtain the final prediction results. We use 10-fold cross-validation to evaluate the model, and the evaluation indicators are significantly improved. The prediction accuracy of the two datasets is as high as 95.41% and 93.50%, respectively. It demonstrates the stronger competitiveness and generalization performance of our model. In addition, all datasets and source codes can be found at https://github.com/HongyanShi026/R5hmCFDV.

show abstract

An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites

Zhang

Shi

2022

View full text Add to dashboard Cite

Motivation 5-Methylcytosine (m5C) is a crucial post-transcriptional modification. With the development of technology, it is widely found in various RNAs. Numerous studies have indicated that m5C plays an essential role in various activities of organisms, such as tRNA recognition, stabilization of RNA structure, RNA metabolism, and so on. Traditional identification is costly and time-consuming by wet biological experiments. Therefore, computational models are commonly used to identify the m5C sites. Due to the vast computing advantages of deep learning, it is feasible to construct the predictive model through deep learning algorithms. Results In this study, we construct a model to identify m5C based on a deep fusion approach with an improved residual network. Firstly, sequence features are extracted from the RNA sequences using Kmer, K-tuple nucleotide frequency component (KNFC), Pseudo dinucleotide composition (PseDNC), and Physical and chemical property (PCP). Kmer and KNFC extract information from a statistical point of view. PseDNC and PCP extract information from the physicochemical properties of RNA sequences. Then, two parts of information are fused with new features using bidirectional long and short-term memory and attention mechanisms, respectively. Immediately after, the fused features are fed into the improved residual network for classification. Finally, 10-fold cross-validation and independent set testing are used to verify the credibility of the model. The results show that the accuracy reaches 91.87%, 95.55%, 92.27%, and 95.60% on the training sets and independent test sets of A. thaliana and M. musculus, respectively. This is a considerable improvement compared to previous studies and demonstrates the robust performance of our model. Availability The data and code related to the study are available at https://github.com/alivelxj/m5c-DFRESG. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning

Yao

Zhang

Liang

2021

SAR and QSAR in Environmental Research

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shengli Zhang

M6A-GSMS: Computational identification of N⁶-methyladenosine sites with GBDT and stacking learning in multiple species

Pep-CNN: An improved convolutional neural network for predicting therapeutic peptides

R5hmCFDV: computational identification of RNA 5-hydroxymethylcytosine based on deep feature fusion and deep voting

An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites

iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning

Contact Info

Product

Resources

About

Shengli Zhang

M6A-GSMS: Computational identification of N6-methyladenosine sites with GBDT and stacking learning in multiple species

Pep-CNN: An improved convolutional neural network for predicting therapeutic peptides

R5hmCFDV: computational identification of RNA 5-hydroxymethylcytosine based on deep feature fusion and deep voting

An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites

iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning

Contact Info

Product

Resources

About

M6A-GSMS: Computational identification of N⁶-methyladenosine sites with GBDT and stacking learning in multiple species