“…However, such methods do not provide guarantees on the accuracy of their approximations that are simultaneously valid for all (or the most frequent) k-mers. In recent years other problems closely related to the task of counting k-mers have been studied, including how to efficiently index [38,15,30,28], represent [7,10,1,14,14,29,17,44], query [53,54,60,55,5,27], and store [18,35,16,43] the massive collections of sequences or of k-mers that are extracted from the data. A natural approach to reduce computational demands is to analyze a small sample instead of the entire dataset.…”