“…However, the bitvectors of pairs of genomes from different species are recalcitrant to compression, even when the species are related: runlength encoding expands those files by a factor of two (Figure 3, insert in the left panel), and RRR expands most of them slightly (by a factor of 1.1), and manages to compress just few pairs with rate 1.25 (Figure 8 in the supplement). The same happens with pairs of artificial strings with controlled mutation rate (see Figures 16,17 In some applications, including genome comparison, short matches are considered noise by the user, and the precise length of a match can be discarded safely as long as we keep track that at that position the match was short. Given an array MS S,T and a user-defined threshold τ , let a thresholded matching statis-tics array MS S,T,τ be such that MS S,T,τ [i] = MS S,T [i] if MS S,T [i] ≥ τ , and MS S,T,τ [i] equals an arbitrary (possibly negative) value smaller than τ otherwise 2 .…”