Data compression using long common strings

Bentley, Jon Louis; McIlroy, D.

doi:10.1109/dcc.1999.755678

Cited by 44 publications

(17 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most of these offline algorithms proceed in a greedy manner, selecting in each iteration one repeated word w according to a score function and replacing all the (non-overlapping) occurrences of the repeat w in the whole grammar by a new terminal N and adding the new production N → w to the grammar. Different heuristics have been used to choose the repeat: the most frequent one [7], the longest [8] and the one that reduces the most the size of the resulting grammar (COMPRESSIVE [9]). GREEDY [10] belongs to this last family but the score used for choosing the words is oriented toward directly optimizing the number of bits needed to encode the grammar rather than minimizing its size.…”

Section: Introductionmentioning

confidence: 99%

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

et al. 2011

View full text Add to dashboard Cite

Abstract:The smallest grammar problem-namely, finding a smallest context-free grammar that generates exactly one sequence-is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.

show abstract

Section: Introductionmentioning

confidence: 99%

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

et al. 2011

View full text Add to dashboard Cite

show abstract

“…edu/homes/stelo/Off-line/. The file is artificially obtained by concatenating with itself, in an attempt to probe into extreme cases of intersequence correlation [21]. The last two families (8 and Table 5 Comparing OFF-LINE with Other Compression Programs on the Chromosomes of the Yeast 9) are a segment of all the upstream regions of the yeast and, thus, not strongly related.…”

Section: Resultsmentioning

confidence: 99%

“…On such inputs, the approach presented here yields scores that are not only better than those of any other method, but also improve increasingly with increasing input size. This is to be attributed to a certain ability to capture distant relationships among the sequences in a family, a feature the merits of which were dramatically exposed in the recent paper [21].…”

Section: Introductionmentioning

confidence: 99%

Off-line compression by greedy textual substitution

Apostolico

Lonardi²

2000

Proc. IEEE

View full text Add to dashboard Cite

“…Its implementations include Xdelta and open-vcdiff. Both of them divide the old version into chunks, and use the dictionary for the chunk fingerprints [9]. As a more popular tool, Xdelta optimizes the generated instructions and prioritizes speed over compression ratio [10].…”

Section: A Incremental Update Methodsmentioning

confidence: 99%

Incremental updates based on graph theory for consumer electronic devices

Chen

Jiang

et al. 2015

IEEE Trans. Consumer Electron.

View full text Add to dashboard Cite

This paper presents a method for incremental updating on consumer electronic devices, called differential compression based on Dijkstra algorithm (DDIFF). It describes the similarities between the old and new versions as a directed weighted graph. In the graph, the shortest path between the start and end vertices corresponds to the minimal delta. As a result, the delta-encoding problem is reduced to the single-source shortest path problem. Experiments show that the proposed method is feasible, and average data transmission saving is as high as 69.3%. In comparison with the existing methods, DDIFF constructs the minimal patch, and the patch costs less time to apply 1 .

show abstract

Data compression using long common strings

Cited by 44 publications

References 11 publications

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

Off-line compression by greedy textual substitution

Incremental updates based on graph theory for consumer electronic devices

Contact Info

Product

Resources

About