2017
DOI: 10.3906/elk-1601-92
|View full text |Cite
|
Sign up to set email alerts
|

A new word-based compression model allowing compressed pattern matching

Abstract: Abstract:In this study a new semistatic data compression model that has a fast coding process and that allows compressed pattern matching is introduced. The name of the proposed model is chosen as tagged word-based compression algorithm (TWBCA) since it has a word-based coding and word-based compressed matching algorithm. The model has two phases. In the first phase a dictionary is constructed by adding a phrase, paying attention to word boundaries, and in the second phase compression is done by using codeword… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…The corresponding pattern matching algorithms for different compression units and compression algorithms are also different. Usually, the short text [21], the suffix [22,23], the word [24], and the character string [25,26] are used as the pattern matching unit of the compressed content. Some studies have used the BM 3 of 17 algorithm as a pattern matching algorithm in compression format [27,28].…”
Section: Related Researchmentioning
confidence: 99%
“…The corresponding pattern matching algorithms for different compression units and compression algorithms are also different. Usually, the short text [21], the suffix [22,23], the word [24], and the character string [25,26] are used as the pattern matching unit of the compressed content. Some studies have used the BM 3 of 17 algorithm as a pattern matching algorithm in compression format [27,28].…”
Section: Related Researchmentioning
confidence: 99%
“…MWCA [4], a word-based compression algorithm that we developed in a previous study, sorts all words according to their frequencies, adds the most frequent 255 words to the D1 dictionary and encodes them as 1 byte and the next 65536 words to the D2 dictionary and encodes them as 2 bytes. Although there have been many different studies in the field of text compression in recent years [5]- [9], the fact that MWCA stores dictionaries and data in different streams provides an important advantage for this study in which only dictionaries are indexed. The advantages of indexing only the word dictionaries created by MWCA instead of indexing the entire documents will be explained in the fourth section.…”
Section: Introductionmentioning
confidence: 99%