Although Kraken's k-mer-based approach provides fast taxonomic classification of metagenomic sequence data, its large memory requirements can be limiting for some applications. Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed five-fold. Kraken 2 also introduces a translated search mode, providing increased sensitivity in viral metagenomics analysis.Assigning taxonomic labels to sequencing reads is an important part of many computational genomics pipelines for metagenomics projects. Recent years have seen several approaches to accomplish this task in a time-efficient manner 1-3 . Kraken 4 used a memory-intensive algorithm that associates short genomic substrings (k-mers) with lowest common ancestor (LCA) taxa. Kraken and related tools like KrakenUniq 5 have proven highly efficient and accurate in other tool comparisons 6,7 . But Kraken's high memory requirements force many researchers to either use a reduced-sensitivity MiniKraken database 8,9 , or to build and use many indexes over subsets of the reference sequences 10,11 . Its memory requirements can easily exceed 100 GB 7 , especially when the reference data includes large eukaryotic genomes 12,13 . Here we introduce Kraken 2, which provides a major reduction in memory usage as well as faster classification, a spaced-seed searching scheme, a translated search mode for matching in amino acid space, and continued compatibility with the Bracken 14 species-level quantification algorithm.Kraken 2 addresses the issue of large memory requirements through two changes to Kraken 1's data structures and algorithms. While Kraken 1 used a sorted list of k-mer/LCA pairs indexed by minimizers 15 , Kraken 2 introduces a probabilistic, compact hash table to map minimizers to LCAs. This table uses one-third of the memory of a standard hash table, at the cost of some specificity and accuracy. Additionally, Kraken 2 only stores minimizers (of length ℓ, ℓ ≤ k) from the reference sequence library in the data structure, whereas Kraken 1's stored all k-mers. Kraken 2's index for a reference database consisting of 9.1 Gbp of genomic sequence uses 10.6 gigabytes of memory at classification time. Kraken 1's index for the same reference set uses 72.4 gigabytes of memory for classification (Figure 1a, Supplementary Table S1). In general, a Kraken 2 database is about 15% as large as a Kraken 1 database over the same references (Supplementary Figure S1).Kraken 2's approach is faster than Kraken 1's because only the distinct minimizers from the query (read) trigger accesses to the hash table. A similar minimizer-based approach has proven useful in accelerating read alignment 16 . Kraken 2 additionally provides a hash-based filtering approach that subsamples the set of minimizer/LCA pairs included in the table, allowing the user to specify a target hash table size; smaller hash tables yield lower memory footprint and higher classification throughput at the expens...