The CRISPR-Cas9 system derived from adaptive immunity in bacteria and archaea has been developed into a powerful tool for genome engineering with wide-ranging applications. Optimizing single-guide RNA (sgRNA) design to improve efficiency of target cleavage is a key step for successful gene editing using the CRISPR-Cas9 system. Because not all sgRNAs that cognate to a given target gene are equally effective, computational tools have been developed based on experimental data to increase the likelihood of selecting effective sgRNAs. Despite considerable efforts to date, it still remains a big challenge to accurately predict functional sgRNAs directly from large-scale sequence data. We propose DeepCas9, a deep-learning framework based on the convolutional neural network (CNN), to automatically learn the sequence determinants and further enable the identification of functional sgRNAs for the CRISPR-Cas9 system. We show that the CNN method outperforms previous methods in both (i) the ability to correctly identify highly active sgRNAs in experiments not used in the training and (ii) the ability to accurately predict the target efficacies of sgRNAs in different organisms. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known nucleotide preferences. We finally demonstrate the application of our method to the design of next-generation genome-scale CRISPRi and CRISPRa libraries targeting human and mouse genomes. We expect that DeepCas9 will assist in reducing the numbers of sgRNAs that must be experimentally validated to enable more effective and efficient genetic screens and genome engineering. DeepCas9 can be freely accessed via the Internet at .
Cancer is one of the most dangerous diseases to human health. The accurate prediction of anticancer peptides (ACPs) would be valuable for the development and design of novel anticancer agents. Current deep neural network models have obtained state-of-the-art prediction accuracy for the ACP classification task. However, based on existing studies, it remains unclear which deep learning architecture achieves the best performance. Thus, in this study, we first present a systematic exploration of three important deep learning architectures: convolutional, recurrent, and convolutional-recurrent networks for distinguishing ACPs from non-ACPs. We find that the recurrent neural network with bidirectional long short-term memory cells is superior to other architectures. By utilizing the proposed model, we implement a sequence-based deep learning tool (DeepACP) to accurately predict the likelihood of a peptide exhibiting anticancer activity. The results indicate that DeepACP outperforms several existing methods and can be used as an effective tool for the prediction of anticancer peptides. Furthermore, we visualize and understand the deep learning model. We hope that our strategy can be extended to identify other types of peptides and may provide more assistance to the development of proteomics and new drugs.
Motivation Various bacterial pathogens can deliver their secreted substrates also called effectors through Type III secretion systems (T3SSs) into host cells and cause diseases. Since T3SS secreted effectors (T3SEs) play important roles in pathogen–host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T3SSs. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to develop a novel and effective method to screen and select putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments. Results We develop a deep convolution neural network to directly classify any protein sequence into T3SEs or non-T3SEs, which is useful for both effector prediction and the study of sequence-function relationship. Different from traditional machine learning-based methods, our method automatically extracts T3SE-related features from a protein N-terminal sequence of 100 residues and maps it to the T3SEs space. We train and test our method on the datasets curated from 16 species, yielding an average classification accuracy of 83.7% in the 5-fold cross-validation and an accuracy of 92.6% for the test set. Moreover, when comparing with known state-of-the-art prediction methods, the accuracy of our method is 6.31–20.73% higher than previous methods on a common independent dataset. Besides, we visualize the convolutional kernels and successfully identify the key features of T3SEs, which contain important signal information for secretion. Finally, some effectors reported in the literature are used to further demonstrate the application of DeepT3. Availability and implementation DeepT3 is freely available at: https://github.com/lje00006/DeepT3. Supplementary information Supplementary data are available at Bioinformatics online.
Alternative splicing (AS) is a genetically and epigenetically regulated pre-mRNA processing to increase transcriptome and proteome diversity. Comprehensively decoding these regulatory mechanisms holds promise in getting deeper insights into a variety of biological contexts involving in AS, such as development and diseases. We assembled splicing (epi)genetic code, DeepCode, for human embryonic stem cell (hESC) differentiation by integrating heterogeneous features of genomic sequences, 16 histone modifications with a multi-label deep neural network. With the advantages of epigenetic features, DeepCode significantly improves the performance in predicting the splicing patterns and their changes during hESC differentiation. Meanwhile, DeepCode reveals the superiority of epigenomic features and their dominant roles in decoding AS patterns, highlighting the necessity of including the epigenetic properties when assembling a more comprehensive splicing code. Moreover, DeepCode allows the robust predictions across cell lineages and datasets. Especially, we identified a putative H3K36me3-regulated AS event leading to a nonsense-mediated mRNA decay of BARD1. Reduced BARD1 expression results in the attenuation of ATM/ATR signalling activities and further the hESC differentiation. These results suggest a novel candidate mechanism linking histone modifications to hESC fate decision. In addition, when trained in different contexts, DeepCode can be expanded to a variety of biological and biomedical fields.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.