“…Even though the Jaccard and minHash sketches are regularly used as a measure of the k-mer content similarity in computational biology software, the weighted Jaccard similarity has been heavily studied and used in other contexts, such as large database document classification and retrieval (e.g., Manasse et al, 2010;Shrivastava, 2016;Wu et al, 2017), near duplicate image detection (Chum et al, 2008), duplicate news story detection (Alonso et al, 2013), source code deduplication (Markovtsev and Kant, 2017), time series indexing (Luo and Shrivastava, 2017), hierarchical topic extraction (Gollapudi and Panigrahy, 2006), or malware classifcation (Drew et al, 2017) and detection (Raff and Nicholas, 2017).…”