2019
DOI: 10.1101/817619
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CRISPRCasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems

Abstract: CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems, and for finding new candidates for genome engineering in eukaryotic models. In this paper, we introduce a holistic strategy that combines regression… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 44 publications
0
4
0
Order By: Relevance
“…Complete and draft bacterial genomes were downloaded from NCBI. CRISPR-Cas systems were annotated using CRISPRcasIdentifier 62 and Casboundary 63 , and CRISPR arrays were extracted from genomes only containing I-E (4,991 arrays), I-F (2,632 arrays), II-A (211 arrays), and II-C (636 arrays) systems using CRISPRidentify 64 . Array orientations were then detected using CRISPRstrand 65 followed by manual curation.…”
Section: Methodsmentioning
confidence: 99%
“…Complete and draft bacterial genomes were downloaded from NCBI. CRISPR-Cas systems were annotated using CRISPRcasIdentifier 62 and Casboundary 63 , and CRISPR arrays were extracted from genomes only containing I-E (4,991 arrays), I-F (2,632 arrays), II-A (211 arrays), and II-C (636 arrays) systems using CRISPRidentify 64 . Array orientations were then detected using CRISPRstrand 65 followed by manual curation.…”
Section: Methodsmentioning
confidence: 99%
“…In more detail, we detect the first CRISPR arrays using the ML-based CRISPRidentify (Component 4), as an array is an evidence for the existence of a CRISPR system and it allows us to determine the repeat, which is subsequently used to determine the anti-repeat part of the tracrRNA via RNA–RNA interaction prediction. Additional evidence is acquired by the identification of cas9 / cas12 genes (Component 5) using CRISPRcasIdentifier ( Padilha et al , 2020 ), and computing the distance to the formed tracrRNA candidates. Each of the listed factors is provided with the corresponding certainty score.…”
Section: Methodsmentioning
confidence: 99%
“…Additionally, antiSMASH 6.01 was used to identify biosynthesis gene clusters (BGCs) and metabolic gene clusters (MGCs) within the genomes (Blin et al, 2019). The CRISPRCasFinder and CRISPRcasIdentifier 14 servers were used to annotate the CRISPR Cas system (Couvin et al, 2018;Padilha et al, 2020). The presence of Cas enzymes were further confirmed by blasting each sequence in the UniProt database (The UniProt Consortium, 2023).…”
Section: Genome Functional Analysismentioning
confidence: 99%