CRISPR/Cas9 system is widely used in a broad range of gene-editing applications. While this editing technique is quite accurate in the target region, there may be many unplanned off-target sites (OTSs). Consequently, a plethora of computational methods have been developed to predict off-target cleavage sites given a guide RNA and a reference genome. However, these methods are based on small-scale datasets (only tens to hundreds of OTSs) produced by experimental techniques to detect OTSs with a low signal-to-noise ratio. Recently, CHANGE-seq, a new in vitro experimental technique to detect OTSs, was used to produce a dataset of unprecedented scale and quality (>200 000 OTS over 110 guide RNAs). In addition, the same study included in cellula GUIDE-seq experiments for 58 of the guide RNAs. Here, we fill the gap in previous computational methods by utilizing these data to systematically evaluate data processing and formulation of the CRISPR OTSs prediction problem. Our evaluations show that data transformation as a pre-processing phase is critical prior to model training. Moreover, we demonstrate the improvement gained by adding potential inactive OTSs to the training datasets. Furthermore, our results point to the importance of adding the number of mismatches between guide RNAs and their OTSs as a feature. Finally, we present predictive off-target in cellula models based on both in vitro and in cellula data and compare them to state-of-the-art methods in predicting true OTSs. Our conclusions will be instrumental in any future development of an off-target predictor based on high-throughput datasets.
Motivation mRNA degradation plays critical roles in post-transcriptional gene regulation. A major component of mRNA degradation is determined by 3’UTR elements. Hence, researchers are interested in studying mRNA dynamics as a function of 3’UTR elements. A recent study measured the mRNA degradation dynamics of tens of thousands of 3’UTR sequences using a massively parallel reporter assay. However, the computational approach used to model mRNA degradation was based on a simplifying assumption of a linear degradation rate. Consequently, the underlying mechanism of 3’UTR elements is still not fully understood. Results Here, we developed deep neural networks to predict mRNA degradation dynamics and interpreted the networks to identify regulatory elements in the 3’UTR and their positional effect. Given an input of a 110 nt-long 3’UTR sequence and an initial mRNA level, the model predicts mRNA levels of 8 consecutive times points. Our deep neural networks significantly improve prediction performance of mRNA degradation dynamics compared to extant methods for the task. Moreover, we demonstrated that models predicting the dynamics of two identical 3’UTR sequences, differing by their poly(A) tail, perform better than single-task models. On the interpretability front, by using Integrated Gradients, our CNNs models identified known and novel cis-regulatory sequence elements of mRNA degradation. By applying a novel systematic evaluation of model interpretability, we demonstrated that the RNN models are inferior to the CNN models in terms of interpretability and that random initialization ensemble improves both predictions and interoperability performance. Moreover, using a mutagenesis analysis, we newly discovered the positional effect of various 3’UTR elements. Availability All the code developed through this study is available at github.com/OrensteinLab/DeepUTR/. Supplementary information Supplementary data are available at Bioinformatics online.
CRISPR/Cas9 system is widely used in a broad range of gene-editing applications. While this gene-editing technique is quite accurate in the target region, there may be many unplanned off-target edited sites. Consequently, a plethora of computational methods have been developed to predict off-target cleavage sites given a guide RNA and a reference genome. However, these methods are based on small-scale datasets (only tens to hundreds of off-target sites) produced by experimental techniques to detect off-target sites with a low signal-to-noise ratio. Recently, CHANGE-seq, a new in vitro experimental technique to detect off-target sites, was used to produce a dataset of unprecedented scale and quality (more than 200,000 off-target sites over 110 guide RNAs). In addition, the same study included GUIDE-seq experiments for 58 of the guide RNAs to produce in vivo measurements of off-target sites. Here, we fill the gap in previous computational methods by utilizing these data to perform a systematic evaluation of data processing and formulation of the CRISPR off-target site prediction problem. Our evaluations show that data transformation as a pre-processing phase is critical prior to model training. Moreover, we demonstrate the improvement gained by adding potential inactive off-target sites to the training datasets. Furthermore, our results point to the importance of adding the number of mismatches between the guide RNA and the off-target site as a feature. Finally, we present predictive off-target in vivo models based on transfer learning from in vitro. Our conclusions will be instrumental to any future development of an off-target predictor based on high-throughput datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.