CRISPR from Prevotella and Francisella 1 (Cpf1) is an effector endonuclease of the class 2 CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins) gene editing system. We developed a method for evaluating Cpf1 activity, based on target sequence composition in mammalian cells, in a high-throughput manner. A library of >11,000 target sequence and guide RNA pairs was delivered into human cells using lentiviral vectors. Subsequent delivery of Cpf1 into this cell library induced insertions and deletions (indels) at the integrated synthetic target sequences, which allowed en masse evaluation of Cpf1 activity by using deep sequencing. With this approach, we determined protospacer-adjacent motif sequences of two Cpf1 nucleases, one from Acidaminococcus sp. BV3L6 (hereafter referred to as AsCpf1) and the other from Lachnospiraceae bacterium ND2006 (hereafter referred to as LbCpf1). We also defined target-sequence-dependent activity profiles of AsCpf1, which enabled the development of a web tool that predicts the indel frequencies for given target sequences (http://big.hanyang.ac.kr/cindel). Both the Cpf1 characterization profile and the in vivo high-throughput evaluation method will greatly facilitate Cpf1-based genome editing.
We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.
We evaluated SpCas9 activities at 12,832 target sequences using a high-throughput approach based on a human cell library containing single-guide RNA–encoding and target sequence pairs. Deep learning–based training on this large dataset of SpCas9-induced indel frequencies led to the development of a SpCas9 activity–predicting model named DeepSpCas9. When tested against independently generated datasets (our own and those published by other groups), DeepSpCas9 showed high generalization performance. DeepSpCas9 is available at http://deepcrispr.info/DeepSpCas9.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.