We focus on the efficient search for the most similar bit strings to a given query in the Hamming space. The Hamming distance can be lower-bounded by the difference of the "number of ones" in the compared strings, i.e. of their weights. Recently, such property has been successfully used by the Hamming Weight Tree (HWT) indexing structure. We propose modifications of the bit strings that preserve pairwise Hamming distances but improve the tightness of these lower bounds, so the query evaluation with the HWT is several times faster. We also show that the unbalanced bit strings, recently reported to provide similar quality of search as the traditionally used balanced bit strings, can be more efficiently indexed with the HWT. Combined with the distance preserving modifications, the HWT query evaluation can be more than one order of magnitude faster than the HWT baseline.
Keywords-Similarity search, Hamming space, HammingWeight Tree, Lower bound in the Hamming space 1 i.e. bit strings with bits set to 1 in a different ratio then a half of bit strings o ∈ X