1993
DOI: 10.1002/(sici)1097-4571(199310)44:9<508::aid-asi2>3.0.co;2-a
|View full text |Cite
|
Sign up to set email alerts
|

Data compression in full-text retrieval systems

Abstract: When data compression is applied to full-text retrieval systems, intricate relationships emerge between the amount of compression, access speed, and computing resources required. We propose compression methods, and explore corresponding tradeoffs, for all components of static full-text systems such as text databases on CD-ROM. These components include lexical indexes, inverted files, bitmaps, signature files, and the main text itself. Results are reported on the application of the methods to several substantia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

1994
1994
2006
2006

Publication Types

Select...
6
2

Relationship

4
4

Authors

Journals

citations
Cited by 46 publications
(29 citation statements)
references
References 31 publications
0
29
0
Order By: Relevance
“…In the absence of compression four bytes and two bytes respectively might be allocated for the d and f d,t values, that is, six bytes for each d, f d,t pair. Using compression the space required can be reduced to about one byte per pair [1]. On the 2 Gb TREC collection, described below, these methods compress the inverted file from 1100 Mb to 184 Mb, an irresistible saving.…”
Section: Document Databasesmentioning
confidence: 99%
See 1 more Smart Citation
“…In the absence of compression four bytes and two bytes respectively might be allocated for the d and f d,t values, that is, six bytes for each d, f d,t pair. Using compression the space required can be reduced to about one byte per pair [1]. On the 2 Gb TREC collection, described below, these methods compress the inverted file from 1100 Mb to 184 Mb, an irresistible saving.…”
Section: Document Databasesmentioning
confidence: 99%
“…Without compression, an inverted file can easily be as large or larger than the text it indexes. Compression results in a net space reduction of as much as 80% of the inverted file size [1], but even with fast decompression-decoding at approximately 400,000 numbers per second on a Sun Sparc 10-it involves a substantial overhead on processing time.…”
Section: Introductionmentioning
confidence: 99%
“…However, because databases are divided into records that must be independently decompressible , adaptive techniques are generally not effective. Similarly, arithmetic coding is in general the preferred coding technique; but it is slow for database applications (Bell et al, 1993).…”
Section: Database Compressionmentioning
confidence: 99%
“…We have used the Elias gamma codes to encode each count w and Golomb codes to represent each sequence of offsets. These techniques are a variation on techniques used for inverted file compression, which has been successfully applied to large text databases (Bell et al, 1993) and to genomic databases (Williams and Zobel, 1996a;Williams and Zobel, 1996b).…”
Section: Direct Codingmentioning
confidence: 99%
“…On the two gigabyte TREC collection these techniques compress the inverted file from 1000 megabytes to 135 megabytes, a dramatic saving. For this reason, if the information retrieval system is to be available on CD-ROM, and if we wish to maximise the amount of information stored on each disk, we should employ compression of both the index and also the stored text [2,3,9]. This is the environment that we consider here.…”
Section: Document Databasesmentioning
confidence: 99%