2Convolutional neural network (CNN) based methods have outperformed conventional machine 3 learning methods in predicting the binding preference of DNA-protein binding. Although studies 4 in the past have shown that more convolutional kernels help to achieve better performance, 5 visualization of the model can be obscured by the use of many kernels, resulting in overfitting 6 and reduced interpretation because the number of motifs in true models is limited. Therefore, 7 we aim to arrive at high performance, but with limited kernel numbers, in CNN-based models for 8 motif inference. 9 We herein present Deepprune, a novel deep learning framework, which prunes the weights 10 in the dense layer and fine-tunes iteratively. These two steps enable the training of CNN-based 11 models with limited kernel numbers, allowing easy interpretation of the learned model. We 12 demonstrate that Deepprune significantly improves motif inference performance for the simulated 13 datasets. Furthermore, we show that Deepprune outperforms the baseline with limited kernel 14 numbers when inferring DNA-binding sites from ChIP-seq data. 15 Keywords: Deep neural networks, Motif inference, Network pruning 16 BACKGROUND Determining how proteins interact with DNA to regulate gene expression is essential for fully understanding 17 many biological processes and disease states. Many DNA binding proteins have affinity for specific DNA 18 binding sites. ChIP-seq combines chromatin immunoprecipitation(ChIP) with massively parallel DNA 19 sequencing to identify DNA binding sites of DNA-associated proteins(Zhang et al., 2008). However, 20 DNA sequences directly obtained by experiments typically contain noise and bias. Consequently, many 21 computational methods have been developed to predict protein-DNA binding, including conventional 22 statistical methods (Badis et al., 2009; Ghandi et al., 2016) and deep learning-based methods (Alipanahi 23 et al., 2015; Zhou and Troyanskaya, 2015; Zeng et al., 2016). Convolutional neural networks (CNNs) have 24 attracted attention for identifying protein-DNA binding motifs in many studies.(Zhou and Troyanskaya, 25 1 Luo et al. ;Alipanahi et al., 2015). Genomic sequences are first encoded in one-hot format; then, a 1-D 26 convolution operation with 4 channels is performed on them. For conventional machine learning methods, 27 the sequence specificities of a protein are often characterized by position weight matrices (PWM)(Stormo, 28 2000). PWM has a direct connection to CNN-based model since the log-likelihood of the resulting PWM 29 of each DNA sequence is exactly the sum of a constant and the convolution of the original kernel on 30 the same sequence from the view of probability model (Ding et al., 2018). Zeng et al.(Zeng et al., 2016) 31 experimented with different structures and hyperparameters and showed that the convolutional layers with 32 more kernels could obtain better performance. They also showed that training models with gradient descent 33 methods is sensitive to weight initializati...