Recurrent Neural Network for Predicting Transcription Factor Binding Sites

Shen, Zhen; Bao, Wenzheng; Huang, De-Shuang

doi:10.1038/s41598-018-33321-1

Cited by 185 publications

(108 citation statements)

References 80 publications

Supporting

Mentioning

108

Contrasting

Order By: Relevance

“…Finally, a pair of reverse complement DNA sequences consist of the same words, thus knowledge could be easily transferred between them. Interestingly, k -mer embedding has recently been showed to surpass one-hot encoding in predicting transcription factor binding 26 . This suggests the general applicability of k -mer embedding in other biological fields.…”

Section: Discussionmentioning

confidence: 99%

DeepMicrobes: taxonomic classification for metagenomics with deep learning

Liang

Bible

Liu

et al. 2019

Preprint

View full text Add to dashboard Cite

10Taxonomic classification is a crucial step for metagenomics applications 11 including disease diagnostics, microbiome analyses, and outbreak tracing. Yet 12it is unknown what deep learning architecture can capture microbial genome-13 wide features relevant to this task. We report DeepMicrobes 14 (https://github.com/MicrobeLab/DeepMicrobes), a computational framework 15 that can perform large-scale training on > 10,000 RefSeq complete microbial 16 genomes and accurately predict the species-of-origin of whole metagenome 17 shotgun sequencing reads. We show the advantage of DeepMicrobes over 18 state-of-the-art tools in precisely identifying species from microbial community 19 sequencing data. Therefore, DeepMicrobes expands the toolbox of taxonomic 20 classification for metagenomics and enables the development of further deep 21 learning-based bioinformatics algorithms for microbial genomic sequence 22 analysis. 23 4 hypothesize that deep learning can automatically discover taxonomic 45 classification-relevant and genome-wide shared features appearing in short 46 metagenomics sequencing reads given a well-designed deep neural network 47 (DNN) architecture. 48Deep learning has made tremendous recent advances in genomics 5 . 49Taking one-hot encoded DNA sequences as input, the DNNs that have been 50 employed to genomic data fall into two major categories, convolutional neural 51 networks (CNNs) and a hybrid of CNNs and recurrent neural networks (RNNs). 52For example, DeepSEA 6 , PrimateAI 7 and SpliceAI 8 used CNNs to predict the 53 impact of genetic variation. Seq2species 9 also adopted CNNs to predict the 54 species-of-origin of 16S data. DeeperBind 10 and DanQ 11 used hybrid 55 architectures to predict transcription factor binding and DNA accessibility. 56Despite the success of these applications, it remains unknown what DNN 57 architecture and DNA encoding method are suitable for taxonomic classification 58 of metagenomics data. 59Here we describe DeepMicrobes, a k-mer embedding-based recurrent 60 network with attention mechanism (Fig. 1a). We trained the DNN on synthetic 61 reads from RefSeq complete bacterial and archaeal genomes. The first layer of 62DeepMicrobes is designed to encode k-mers to dense vectors through 63 embedding. The vectors are fed into a bidirectional long short-term memory 64 network (BiLSTM) followed by self-attention and a multilayer perceptron (MLP). 65

show abstract

Section: Discussionmentioning

confidence: 99%

DeepMicrobes: taxonomic classification for metagenomics with deep learning

Liang

Bible

Liu

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Recently, many studies have investigated the interpretation of neural networks and the underlying model behind real-world datasets. They utilize complex models, such as RNN and the model with attention mechanism, which comes from the field of natural language processing, to represent the complex information of biological sequences(Zuallaert et al, 2018; Luo et al, 2019; Shen et al, 2018; Pan and Shen, 2018; Pan and Yan, 2017; Li et al, 2019; Pan et al, 2018). Actually, from the diversity of DNA-protein binding, we suggest using different architectures to model motif inference for specific proteins.…”

Section: Resultsmentioning

confidence: 99%

Deepprune: Learning efficient and interpretable convolutional networks through weight pruning for predicting DNA-protein binding

Luo

Chi

Deng

2019

Preprint

View full text Add to dashboard Cite

2Convolutional neural network (CNN) based methods have outperformed conventional machine 3 learning methods in predicting the binding preference of DNA-protein binding. Although studies 4 in the past have shown that more convolutional kernels help to achieve better performance, 5 visualization of the model can be obscured by the use of many kernels, resulting in overfitting 6 and reduced interpretation because the number of motifs in true models is limited. Therefore, 7 we aim to arrive at high performance, but with limited kernel numbers, in CNN-based models for 8 motif inference. 9 We herein present Deepprune, a novel deep learning framework, which prunes the weights 10 in the dense layer and fine-tunes iteratively. These two steps enable the training of CNN-based 11 models with limited kernel numbers, allowing easy interpretation of the learned model. We 12 demonstrate that Deepprune significantly improves motif inference performance for the simulated 13 datasets. Furthermore, we show that Deepprune outperforms the baseline with limited kernel 14 numbers when inferring DNA-binding sites from ChIP-seq data. 15 Keywords: Deep neural networks, Motif inference, Network pruning 16 BACKGROUND Determining how proteins interact with DNA to regulate gene expression is essential for fully understanding 17 many biological processes and disease states. Many DNA binding proteins have affinity for specific DNA 18 binding sites. ChIP-seq combines chromatin immunoprecipitation(ChIP) with massively parallel DNA 19 sequencing to identify DNA binding sites of DNA-associated proteins(Zhang et al., 2008). However, 20 DNA sequences directly obtained by experiments typically contain noise and bias. Consequently, many 21 computational methods have been developed to predict protein-DNA binding, including conventional 22 statistical methods (Badis et al., 2009; Ghandi et al., 2016) and deep learning-based methods (Alipanahi 23 et al., 2015; Zhou and Troyanskaya, 2015; Zeng et al., 2016). Convolutional neural networks (CNNs) have 24 attracted attention for identifying protein-DNA binding motifs in many studies.(Zhou and Troyanskaya, 25 1 Luo et al. ;Alipanahi et al., 2015). Genomic sequences are first encoded in one-hot format; then, a 1-D 26 convolution operation with 4 channels is performed on them. For conventional machine learning methods, 27 the sequence specificities of a protein are often characterized by position weight matrices (PWM)(Stormo, 28 2000). PWM has a direct connection to CNN-based model since the log-likelihood of the resulting PWM 29 of each DNA sequence is exactly the sum of a constant and the convolution of the original kernel on 30 the same sequence from the view of probability model (Ding et al., 2018). Zeng et al.(Zeng et al., 2016) 31 experimented with different structures and hyperparameters and showed that the convolutional layers with 32 more kernels could obtain better performance. They also showed that training models with gradient descent 33 methods is sensitive to weight initializati...

show abstract

“…Recently, many deep learning methods are used for medical data analysis, such as convolutional neural networks, recurrent neural network, autoencoder and so on. However, these approaches require large-scale data [38][39][40]. The aim of this study is to develop a feature representation method to fully and effectively describe on ONH for glaucoma detection.…”

Section: Glaucoma Detection Based On Texture Featurementioning

confidence: 99%

A novel glaucomatous representation method based on Radon and wavelet transform

et al. 2019

View full text Add to dashboard Cite

Background: Glaucoma is an irreversible eye disease caused by the optic nerve injury. Therefore, it usually changes the structure of the optic nerve head (ONH). Clinically, ONH assessment based on fundus image is one of the most useful way for glaucoma detection. However, the effective representation for ONH assessment is a challenging task because its structural changes result in the complex and mixed visual patterns. Method: We proposed a novel feature representation based on Radon and Wavelet transform to capture these visual patterns. Firstly, Radon transform (RT) is used to map the fundus image into Radon domain, in which the spatial radial variations of ONH are converted to a discrete signal for the description of image structural features. Secondly, the discrete wavelet transform (DWT) is utilized to capture differences and get quantitative representation. Finally, principal component analysis (PCA) and support vector machine (SVM) are used for dimensionality reduction and glaucoma detection. Results: The proposed method achieves the state-of-the-art detection performance on RIMONE-r2 dataset with the accuracy and area under the curve (AUC) at 0.861 and 0.906, respectively. Conclusion: In conclusion, we showed that the proposed method has the capacity as an effective tool for largescale glaucoma screening, and it can provide a reference for the clinical diagnosis on glaucoma.

show abstract

Recurrent Neural Network for Predicting Transcription Factor Binding Sites

Cited by 185 publications

References 80 publications

DeepMicrobes: taxonomic classification for metagenomics with deep learning

DeepMicrobes: taxonomic classification for metagenomics with deep learning

Deepprune: Learning efficient and interpretable convolutional networks through weight pruning for predicting DNA-protein binding

A novel glaucomatous representation method based on Radon and wavelet transform

Contact Info

Product

Resources

About