N 6 -Methyladenosine (m 6 A) refers to methylation modification of the adenosine nucleotide acid at the nitrogen-6 position. Many conventional computational methods for identifying N 6 -methyladenosine sites are limited by the small amount of data available. Taking advantage of the thousands of m 6 A sites detected by high-throughput sequencing, it is now possible to discover the characteristics of m 6 A sequences using deep learning techniques. To the best of our knowledge, our work is the first attempt to use word embedding and deep neural networks for m 6 A prediction from mRNA sequences. Using four deep neural networks, we developed a model inferred from a larger sequence shifting window that can predict m 6 A accurately and robustly. Four prediction schemes were built with various RNA sequence representations and optimized convolutional neural networks. The soft voting results from the four deep networks were shown to outperform all of the stateof-the-art methods. We evaluated these predictors mentioned above on a rigorous independent test data set and proved that our proposed method outperforms the state-of-the-art predictors. The training, independent, and cross-species testing data sets are much larger than in previous studies, which could help to avoid the problem of overfitting. Furthermore, an online prediction web server implementing the four proposed predictors has been built and is available at http://server. malab.cn/Gene2vec/.
MicroRNA (miRNA) plays an important role as a regulator in biological processes. Identification of (pre-) miRNAs helps in understanding regulatory processes. Machine learning methods have been designed for pre-miRNA identification. However, most of them cannot provide reliable predictive performances on independent testing data sets. We assumed this is because the training sets, especially the negative training sets, are not sufficiently representative. To generate a representative negative set, we proposed a novel negative sample selection technique, and successfully collected negative samples with improved quality. Two recent classifiers rebuilt with the proposed negative set achieved an improvement of ~6 percent in their predictive performance, which confirmed this assumption. Based on the proposed negative set, we constructed a training set, and developed an online system called miRNApre specifically for human pre-miRNA identification. We showed that miRNApre achieved accuracies on updated human and non-human data sets that were 34.3 and 7.6 percent higher than those achieved by current methods. The results suggest that miRNApre is an effective tool for pre-miRNA identification. Additionally, by integrating miRNApre, we developed a miRNA mining tool, mirnaDetect, which can be applied to find potential miRNAs in genome-scale data. MirnaDetect achieved a comparable mining performance on human chromosome 19 data as other existing methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.