Researchers have proposed using hardware data compression units within the memory hierarchies of microprocessors in order to improve performance, energy efficiency, and functionality. However, most past work, and in particular work on cache compression, has made unsubstantiated assumptions about the performance, power consumption, and area overheads of the required compression hardware. We present a lossless compression algorithm that has been designed for on-line memory hierarchy compression, and cache compression in particular. We reduced our algorithm to a register transfer level hardware implementation, permitting performance, power consumption, and area estimation. The results of experiments comparing our work to previous work are presented.