Using longk-mers to find sequence matches is increasingly used in many bioinformatic applications, including metagenomic sequence classification. The accuracy of these downstream applications relies on the density of the reference databases, which, luckily, are rapidly growing. While the increased density provides hope for dramatic improvements in accuracy, scalability is a concern. Thek-mers are kept in the memory during the query time, and saving allk-mers of these ever-expanding databases is fast becoming impractical. Several strategies for subsamplingk-mers have been proposed, including minimizers and finding taxon-specifick-mers. However, we contend that these strategies are inadequate, especially when reference sets are taxonomically imbalanced, as are most microbial libraries. In this paper, we specifically ask the question: Given limited memory, what is the best strategy to select a subset ofk-mers from an ultra-large dataset to include in a library such that the classification of reads suffers the least? We explore strategies to achieve this goal and show a set of experiments demonstrating the limitations of existing approaches, especially for novel and poorly sampled groups. We propose a library construction algorithm called KRANK (K-mer RANKer) that combines several components, including a hierarchical selection strategy with adaptive size restrictions and an equitable coverage strategy. We implement KRANK in highly optimized code and combine it with the locality-sensitive-hashing classifier CONSULT-II. Our method is able to reduce the memory consumption from roughly 140Gb down to 6, 12, or 24Gb, with only a 3.8%, 2.5%, or 0.5% loss in the F1 score. We show in extensive analyses that KRANK outperforms alternatives in both taxonomic classification and taxonomic profiling, using reasonable memory sizes.Code availabilityThe implementation is available athttps://github.com/bo1929/KRANK.