Convolutional neural networks for classification of alignments of non-coding RNA sequences

Aoki, Genta; Sakakibara, Yasubumi

doi:10.1093/bioinformatics/bty228

Cited by 74 publications

(55 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…particularly adept at recognizing motifs and long-range interactions in nucleotide sequence data 10,[18][19][20][40][41][42][43][44] . We trained a CNN on a one-hot sequence input, an LSTM on a one-hot sequence input, and a CNN on a two-dimensional, one-hot complementarity map representation input (see "Methods" for complete descriptions of all models).…”

Section: Resultsmentioning

confidence: 99%

A deep learning approach to programmable RNA switches

et al. 2020

View full text Add to dashboard Cite

Engineered RNA elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids. Predicting the behavior of these synthetic biology components remains a challenge, a situation that could be addressed through enhanced pattern recognition from deep learning. Here, we investigate Deep Neural Networks (DNN) to predict toehold switch function as a canonical riboswitch model in synthetic biology. To facilitate DNN training, we synthesize and characterize in vivo a dataset of 91,534 toehold switches spanning 23 viral genomes and 906 human transcription factors. DNNs trained on nucleotide sequences outperform (R2 = 0.43–0.70) previous state-of-the-art thermodynamic and kinetic models (R2 = 0.04–0.15) and allow for human-understandable attention-visualizations (VIS4Map) to identify success and failure modes. This work shows that deep learning approaches can be used for functionality predictions and insight generation in RNA synthetic biology.

show abstract

Section: Resultsmentioning

confidence: 99%

A deep learning approach to programmable RNA switches

et al. 2020

View full text Add to dashboard Cite

show abstract

“…CNN is an essential model of deep learning, and suitable for identifying sequence profiles, due to its excellent feature extraction capability on high-dimensional data (Kelley et al, 2016;Zeng et al, 2016). The input vector of CNN is primarily based on sequence-derived features, such as the frequency of k-mer occurrence applied in this study and one-hot vector strategy (Aoki and Sakakibara, 2018;Fiannaca et al, 2015;Ghandi et al, 2014;Lee et al, 2011;Nguyen et al, 2016). One apparent advantage of the one-hot vector is to reserve specific position information of each individual nucleotide in sequences.…”

Section: Discussionmentioning

confidence: 99%

“…One particular deep learning model--Convolutional Neural Network (CNN)--have achieved outstanding performance in image classification, speech recognition, and natural language processing (Krizhevsky et al, 2012;Schmidhuber, 2015). CNN model has also been successfully applied in prediction of unknown sequences profiles or motifs and functional activity discovery, without pre-defining sequence features such as prediction of sequence specificities of DNA-and RNAbinding proteins (Alipanahi et al, 2015), effects of noncoding variants (Zhou and Troyanskaya, 2015), and classification of alignments of noncoding RNA sequences (Alipanahi et al, 2015;Aoki and Sakakibara, 2018;Schmidhuber, 2015;Zeng et al, 2016;Zhou and Troyanskaya, 2015).…”

Section: Introductionmentioning

confidence: 99%

DeepTE: a computational method for de novo classification of transposons with convolutional neural network

Yan

Bombarely

2020

Preprint

View full text Add to dashboard Cite

Motivation: Transposable elements (TEs) classification is an essential step to decode their roles in genome evolution. With a large number of genomes from non-model species becoming available, accurate and efficient TE classification has emerged as a new challenge in genomic sequence analysis. Results: We developed a novel tool, DeepTE, which classifies unknown TEs using convolutional neural networks. DeepTE transferred sequences into input vectors based on k-mer counts. A tree structured classification process was used where eight models were trained to classify TEs into super families and orders. DeepTE also detected domains inside TEs to correct false classification. An additional model was trained to distinguish between non-TEs and TEs in plants. Given unclassified TEs of different species, DeepTE can classify TEs into seven orders, which include 15, 24, and 16 super families in plants, metazoans, and fungi, respectively. In several benchmarking tests, DeepTE outperformed other existing tools for TE classification. In conclusion, DeepTE successfully leverages convolutional neural network for TE classification, and can be used to precisely identify and annotate TEs in newly sequenced eukaryotic genomes.

show abstract

“…In recent years, Convolutional Neural Network (CNN) has been widely used to solve biological problems. 22 , 27 , 28 The structure of the CNN is shown in Figure 1 . It contains a convolutional layer with 200 filters in which the kernel size is 6.…”

Section: Methodsmentioning

confidence: 99%

im6A-TS-CNN: Identifying the N6-Methyladenine Site in Multiple Tissues by Using the Convolutional Neural Network

Liu

Cao

et al. 2020

Molecular Therapy - Nucleic Acids

View full text Add to dashboard Cite

N 6 -methyladenosine (m 6 A) is the most abundant post-transcriptional modification and involves a series of important biological processes. Therefore, accurate detection of the m 6 A site is very important for revealing its biological functions and impacts on diseases. Although both experimental and computational methods have been proposed for identifying m 6 A sites, few of them are able to detect m 6 A sites in different tissues. With the consideration of the spatial specificity of m 6 A modification, it is necessary to develop methods able to detect the m 6 A site in different tissues. In this work, by using the convolutional neural network (CNN), we proposed a new method, called im6A-TS-CNN, that can identify m 6 A sites in brain, liver, kidney, heart, and testis of Homo sapiens , Mus musculus , and Rattus norvegicus . In im6A-TS-CNN, the samples were encoded by using the one-hot encoding scheme. The results from both a 5-fold cross-validation test and independent dataset test demonstrate that im6A-TS-CNN is better than the existing method for the same purpose. The command-line version of im6A-TS-CNN is available at https://github.com/liukeweiaway/DeepM6A_cnn .

show abstract

Convolutional neural networks for classification of alignments of non-coding RNA sequences

Cited by 74 publications

References 24 publications

A deep learning approach to programmable RNA switches

A deep learning approach to programmable RNA switches

DeepTE: a computational method for de novo classification of transposons with convolutional neural network

im6A-TS-CNN: Identifying the N6-Methyladenine Site in Multiple Tissues by Using the Convolutional Neural Network

Contact Info

Product

Resources

About