Arrays of clustered, regularly spaced short palindromic repeats (CRISPR) are widespread in the genomes of many bacteria and almost all archaea. These arrays are composed of direct repeats sized 24-47 bp separated by similarly sized non-repetitive sequences (spacers). It was recently experimentally shown that CRISPR arrays, along with a group of associated proteins, confer resistance to phage. Following exposure to phage, bacteria integrate new spacer sequences that are derived from the phage genome. Acquisition of these spacers enables the bacterial cell to shutdown the phage attack, presumably by an RNA-interference-like mechanism. This Progress discusses the structure and function of CRISPRs and the implications of this new antiviral mechanism in bacteria.2 Bacteriophages constitute the most populous life-forms on Earth 1 . In sea water, an environment in which phage abundance has been extensively studied, it has been estimated that there are 5-10 phage for every bacterial cell 2 . Despite being outnumbered by phage, bacteria proliferate and avoid extinction by using a battery of innate phageresistance mechanisms such as restriction enzymes and abortive infection 3 . In this Progress article we describe the CRISPR system, a recently discovered defence mechanism, which is remarkable because it confers acquired phage resistance in Bacteria and Archaea. A hallmark of this system are arrays of short direct repeats interspersed by non-repetitive spacer sequences, the so-called Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR). Additional components of the system include CRISPRassociated (CAS) genes and a leader sequence (Fig. 1A).
Brief history of CRISPR researchThe first report that described a CRISPR array, in 1987, was from Ishino et al. who found 14 repeats of 29bp interspersed by 32-33bp non-repeating spacer sequences 4,5 , adjacent to the isozyme converting alkaline phosphatase (iap) gene in Escherichia coli. In subsequent years similar CRISPR arrays were found in Mycobacterium tuberculosis 6 , Haloferax mediterranei 7 , Methanocaldococcus jannaschii 8 , Thermotoga maritima 9 and other bacteria and archaea. The accumulation of sequenced microbial genomes allowed genome-wide computational searches for CRISPRs (the first such analysis was carried out by Mojica et al. in 2000 10 ), and the most recent computational analyses revealed that CRISPRs are found in ~40% of bacterial and ~90% of archaeal genomes sequenced to date 11, 12 (Box 1).In parallel to the initial appreciation of the abundance of CRISPRs 13 , Jansen et al. identified four CRISPR-associated (CAS) genes that were almost always found adjacent to the repeat arrays 14 . Subsequent studies initiated by Koonin and colleagues 15, 16 and Haft et al. 17 uncovered 25-45 additional CAS genes appearing in close proximity to the arrays. The same set of genes is absent from genomes that lack CRISPRs.Several hypotheses for the function of CRISPRs have been proposed. Early in 1995 Mojica et al. suggested that the repeats were involved i...