“…In our previous work, we found that SEEKR performed best when the length of the lncRNA or lncRNA fragment being studied was similar to 4^k, i.e. the total number of possible k-mers at kmer length k. In tests of Xist-like repressive activity, we found that comparisons of lncRNAs using k-mer lengths of k³7 underperformed relative to comparisons using smaller k-mer lengths, owing to the fact that most annotated lncRNAs are much less than 4^7 (16384) nucleotides long, and kmer profiles of individual lncRNAs at k³7 (³16384 possible k-mers) are dominated by "0" values (Kirk et al, 2018). Based on this observation, and because Repeats A and B, two essential repetitive regions within Xist (Almeida et al, 2017;Hoki et al, 2009;Pintacuda et al, 2017;Royce-Tolland et al, 2010;Wutz et al, 2002), are each about 4^4 (256) nucleotides in length, we reasoned that k-mer profiles at k=4 (4^4=256 possible k-mers) would provide a reasonable estimate of sequence complexity for the repeats without being dominated by "0" values.…”