This paper presents Tagged Sub-optimal code (TSC), a new coding technique to speed up string matching over compressed databases on personal digital assistants (PDA). TSC is a variable-length sub-optimal code that supports minimal prefix property. It always determines its codeword boundary without traversing a tree or lookup table. TSC technique may be beneficial in many types of applications: speeding up string matching over compressed text, and speeding decoding process. This paper also presents two algorithms for string matching over compressed text using TSC (SCTT) and the Byte Pair Encoding (BPE) technique (SCTB).Several experiments were conducted to compare the performance of TSC, Byte Pair Encoding (BPE), and Huffman code. Several PDA databases with different record sizes were used: the well-known Calgary dataset and a set of small-sized PDA databases. Experimental results show that SCTT is almost twice as fast as the Huffmanbased algorithm. SCTT has also the same performance in search time as the search in uncompressed databases and is faster than the SCTB algorithm. For frequently updated PDA databases such as phone books, to-do list, and memos, SCTT is the recommended method regardless of the size of the average record length, since the time required to compress the updated records using BPE poses significant delays compared to TSC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.