A two-stage method for compressing bilevel images is described that is particularly effective for images containing repeated sub-images, notably text. In the first stage, connected groups of pixels, corresponding approximately to individual characters, are extracted from the image. These are matched against an adaptively-constructed library of patterns seen so far, and the resulting sequence of symbol identification numbers is coded and transmitted. From this information, along with the library itself and the offsets from one mark to the next, an approximate image can be reconstructed. The result is a lossy method of compression that outperforms other schemes. The second stage employs the reconstructed image as an aid for encoding the original image using a statistical context-based compression technique. This yields a total bandwidth for exact transmission appreciably undercutting that required by other lossless binary image compression methods. Taken together, the lossy and lossless methods provide an effective two-stage progressive transmission capability for textual images which has application for legal, medical and historical purposes, and to archiving in general.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.