We introduce LG-encoding, a novel approach to text encoding that shuffles the position of letters anticipating an improved compression performance. Our technique brings together the repeating letters in a word, so as to inflate redundancy to be exploited by the compression algorithm to follow. The encoding process introduces no significant overhead: It is easily reversible as it only involves repositioning the letters in a text. We experiment LG-encoding on text from 4 different source languages: English, French, German, and Spanish with a set of well-known compression algorithms that follows the encoding: Arithmetic Coding, Huffman Coding, BWT and PPM. Our results yield promising outcomes as we achieve substantially better compression rates for Arithmetic Coding and Huffman Coding that follows LG-encoding. We also propose use of our method in large data repositories, such as cloud, as it also provides significant level of security by shuffling the letters of words in text.
Index Terms-Text encoding, lossless text compression.