In this paper, we present a new algorithm for text compression. The basic idea of our algorithm is to define a unique encryption or signature of each word in the dict.ionary by replacing certain characters in the words by a special character '*' and retaining a few characters so that the word is still retrievable.For any encrypted text the most frequently used character is '*' and the standard compression algorithms can exploit this redundancy in an effective way. We advocate the following compression paradigm in this paper: Given a compression algorithm il and a text T. we apply the same algorithm A on an encrypted text *T and retrieve the original text via a dictionary which maps the decompressed text *T to the original text T. We report better results for most widely used compression algorithms such as Huffman, LZW, arithmetic, unix compress, gnu-zip with respect to a text corpus. The compression rates using these algorithms are much better than t.he dictionary based methods reported in the literature.One basic assumption of our algorithm is that the system has access to a dictionary of words used in all the texts along with a corresponding "cryptic" dictionary. The cost of this dictionary is amortized over the compression savings for all the text files handled by the organization. If two organizations wish to exchange information using our compression algorithm, they must share a common dictionary. We compare our methods with other dictionary based methods and present future research problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.