CRISPR-Cas loci encode for highly diversified prokaryotic adaptive defense systems that have recently become popular for their applications in gene editing and beyond. The increasing demand for bioinformatic tools that systematically detect and classify CRISPR-Cas systems has been largely challenged by their complex dynamic nature and rapidly expanding classification. Here, we developed CRISPRCasTyper, a new automated software tool with improved capabilities for identifying and typing CRISPR arrays and cas loci across prokaryotic sequences, based on the latest classification and nomenclature (39 subtypes/variants) (Makarova et al. 2020;Pinilla-Redondo et al. 2019) . As a novel feature, CRISPRCasTyper uses a machine learning approach to subtype CRISPR arrays based on the sequences of the direct repeats. This allows the typing of orphan and distant arrays which, for example, are commonly observed in fragmented metagenomic assemblies. Furthermore, the tool provides a graphical output, where CRISPRs and cas operon arrangements are visualized in the form of colored gene maps, thus aiding annotation of partial and novel systems through synteny. Moreover, CRISPRCasTyper can resolve hybrid CRISPR-Cas systems and detect loci spanning the ends of sequences with a circular topology, such as complete genomes and plasmids. CRISPRCasTyper was benchmarked against a manually curated set of 31 subtypes/variants with a median accuracy of 98.6%. Altogether, we present an up-to-date and freely available software pipeline for significantly improved automated predictions of CRISPR-Cas loci across genomic sequences.
ImplementationCRISPRCasTyper is available through conda and PyPi under the MIT license ( https://github.com/Russel88/CRISPRCasTyper ), and is also available as a web server