Variable-to-variable-length lossless codes a r e described using dual leaf-linked trees: one specifying t h e parsing of t h e source symbols into source words and t h e other specifying t h e formation of code words from code symbols. Compression exceeds entropy by t h e amount of t h e informational divergence, between source words and code words, divided by t h e expected source-word length. The asymptotic optimality of Tunstall or Huffman codes derives from t h e bounding of divergence while t h e expected source-word length is made arbitrarily large. A heuristic extension scheme is asymptotically optimal but also acts to reduce t h e divergence by retaining those source words which a r e well matched t o their corresponding code words.
DUAL-TREE ENTROPY CODING [3][4]Variable-length source words are specified by a complete, proper, labelled, a-ary parse tree. Its symbol alphabet SW, with lSwl = a, is that of the source sequence w. Its index alphabet is Iw = (0, ,7' -1 ) . To parse one source word at the encoder, symbols of the source sequence specify a path from the root node of the parse tree until a leaf node is reached. To reconstruct one source word at the decoder, a leaf node is specified and the source symbols are recreated by following its unique path from the parse tree root.Variable-length code words are similarly specified by a complete, proper, labelled, p-ary code tree. Its symbol alphabet Sz, with lSzl = p, is that of the channel sequence z. Its index alphabet is I z = (0, --,7). Code words are parsed from the channel sequence by the decoder or constructed by the encoder using root-to-leaf paths in the code tree.A one-to-one interface is established between the 7' words of the parse tree and a subset of the 7 words of the code tree. To avoid unused code words, it is assumed that the parse tree has 7 -7' fake leaf nodes representing arbitrary source words of tThis work