2003
DOI: 10.1007/978-3-540-39984-1_10
|View full text |Cite
|
Sign up to set email alerts
|

(S,C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases

Abstract: Abstract. This work presents (s, c)-Dense Code, a new method for compressing natural language texts. This technique is a generalization of a previous compression technique called End-Tagged Dense Code that obtains better compression ratio as well as a simpler and faster encoding than Tagged Huffman. At the same time, (s, c)-Dense Code is a prefix code that maintains the most interesting features of Tagged Huffman Code with respect to direct search on the compressed text. (s, c)-Dense Coding retains all the eff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
52
0

Year Published

2005
2005
2013
2013

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 68 publications
(52 citation statements)
references
References 16 publications
0
52
0
Order By: Relevance
“…Inverted indexes are designed to take advantage of a myriad of different compression techniques. As such, our baselines also support several state-of-the-art byte and word aligned compression algorithms [3,9,28,39,43]. So, when we report the space usage for an inverted index, the numbers are reported using compressed inverted indexes and compressed document collections.…”
Section: Space Usagementioning
confidence: 99%
“…Inverted indexes are designed to take advantage of a myriad of different compression techniques. As such, our baselines also support several state-of-the-art byte and word aligned compression algorithms [3,9,28,39,43]. So, when we report the space usage for an inverted index, the numbers are reported using compressed inverted indexes and compressed document collections.…”
Section: Space Usagementioning
confidence: 99%
“…This dense coding, however, is interesting by itself as a bound for the compression that can be obtained with a Huffman code. In this section we present this coding and some of its properties, generalizing the previous proposal of [3]. It should be clear that a stop-cont coding is just a base-c numerical representation, with the exception that the last digit is between c and c + s − 1, i.e., the last digit is a base-s number that is distinguished from previous digits by adding c. Digits between 0 and c−1 are called "continuers" and those between c and c + s − 1 are called "stoppers".…”
Section: Dense Codingmentioning
confidence: 99%
“…In [3] we proposed Dense Coding as a more efficient alternative to Tagged Huffman Coding [14] for direct compressed text searching on natural language texts. This dense coding, however, is interesting by itself as a bound for the compression that can be obtained with a Huffman code.…”
Section: Dense Codingmentioning
confidence: 99%
See 1 more Smart Citation
“…The loss incurred by not using an optimal (Huffman) code is often tolerable, and other non-optimal variants with desirable features, such as faster processing and simplicity have been suggested, for example Tagged Huffman codes [5], EndTagged Dense codes [3] and (s, c)-Dense codes [2]. Similarly, the loss of optimality caused by moving to not fully sorted frequencies can also be acceptable in certain applications, for example when based on estimations rather than on actual counts.…”
Section: Introductionmentioning
confidence: 99%