Genome editing technologies are rapidly evolving, and analysis of deep sequencing data from target or offtarget regions is necessary for measuring editing efficiency and evaluating safety. However, no software exists to analyze base editors, perform allele-specific quantification or that incorporates biologically-informed and scalable alignment approaches. Here, we present CRISPResso2 to fill this gap and illustrate its functionality by experimentally measuring and analyzing the editing properties of six genome editing agents.. CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/392217 doi: bioRxiv preprint first posted online Aug. 15, 2018; 2 The field of genome editing is rapidly advancing, and the technologies to modify the genome are becoming increasingly more accurate, efficient and versatile 1 . For example, base editors-a recent class of genome editing technology-harness the targeting properties of RNA-guided endonucleases to precisely change one nucleotide in a predictable manner 2,3,4 . As sequencing costs decrease and access to next-generation sequencing machines becomes more widespread, targeted amplicon sequencing is becoming the gold standard for the validation and characterization of genome editing experiments.CRISPResso2 introduces five key innovations for the analysis of genome editing data: (1) Comprehensive analysis of sequencing data from base editors; (2) Allele specific quantification of heterozygous references; (3) A novel biologically-informed alignment algorithm; (4) Ultra-fast processing time; and (5) A batch mode for analyzing and comparing multiple editing experiments.Existing software packages for the analysis of data generated by genome editing experiments are designed to only analyze cleavage events resulting from nuclease activity 5,6,7,8,9,10 . CRISPResso2 (http://crispresso2.pinellolab.org) is the first comprehensive software specifically designed to analyze base editor data from amplicon sequencing, in addition to quantifying and visualizing indels from other nucleases. CRISPResso2 allows users to readily quantify and visualize amplicon sequencing data from base editing experiments. It takes in raw FASTQ sequencing files as input and outputs reports describing frequencies and efficiencies of base editing activity, plots showing base substitutions across the entire amplicon region (Fig. 1a) and nucleotide substitution frequencies for a region specified by the user (Fig. 1b). Additionally, users can specify the nucleotide substitution (e.g., C->T or A->G) that is relevant for the base editor used, and publication-quality plots are produced for nucleotides of interest with a heatmap showing conversion efficiency (Fig. 1c).
Reads (left) can be assigned to each allele using CRISPResso2 (right) to achieve accurate quantification of genome editing at genomic loci with mu...