Timothy C. Bell scite author profile

The best schemes for text compression use large models to help them predict which characters will come next. The actual next characters are coded with respect to the prediction, resulting in compression of information. Models are best formed adaptively, based on the text seen so far. This paper surveys successful strategies for adaptive modeling that are suitable for use in practical text compression systems. The strategies fall into three main classes: finite-context modeling, in which the last few characters are used to condition the probability distribution for the next one; finite-state modeling, in which the distribution is conditioned by the current state (and which subsumes finite-context modeling as an important special case); and dictionary modeling, in which strings of characters are replaced by pointers into an evolving dictionary. A comparison of different methods on the same sample texts is included, along with an analysis of future research directions.

show abstract

The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching

Adjeroh

2008

View full text Add to dashboard Cite

Textual image compression: two-stage lossy/lossless encoding of textual images

et al. 1994

View full text Add to dashboard Cite

A two-stage method for compressing bilevel images is described that is particularly effective for images containing repeated sub-images, notably text. In the first stage, connected groups of pixels, corresponding approximately to individual characters, are extracted from the image. These are matched against an adaptively-constructed library of patterns seen so far, and the resulting sequence of symbol identification numbers is coded and transmitted. From this information, along with the library itself and the offsets from one mark to the next, an approximate image can be reconstructed. The result is a lossy method of compression that outperforms other schemes. The second stage employs the reconstructed image as an aid for encoding the original image using a statistical context-based compression technique. This yields a total bandwidth for exact transmission appreciably undercutting that required by other lossless binary image compression methods. Taken together, the lossy and lossless methods provide an effective two-stage progressive transmission capability for textual images which has application for legal, medical and historical purposes, and to archiving in general.

show abstract

DNA sequence compression using the Burrows-Wheeler Transform

Adjeroh

Zhang

Mukherjee

et al.

View full text Add to dashboard Cite

--We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT and important pattern matching data structures, such as the suffix tree and suffix array. We discuss how the proposed approach can be incorporated in the BWT compression pipeline.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Timothy C. Bell

Managing Gigabytes: Compressing and Indexing Documents and Images

Modeling for text compression

The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching

Textual image compression: two-stage lossy/lossless encoding of textual images

DNA sequence compression using the Burrows-Wheeler Transform

Contact Info

Product

Resources

About