2019
DOI: 10.35940/ijrte.b1806.078219
|View full text |Cite
|
Sign up to set email alerts
|

Preprocessed Text Compression Method for Malayalam Text Files

Abstract: The increasing importance of Unicode for text files implies an increase in storage space required for data and the time for the transmission of data, with a corresponding need for compression of data. Conventional compressors fair purely on UTF-8 texts, where each character can span multiple bytes. Malayalam which is one among the four major languages of the Dravidian family, is represented by using Unicode characters. The contribution of this paper is a reversible transformation mapping of the input to reduce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 2 publications
0
0
0
Order By: Relevance
“…(a) Structure of text [21], [29], [62]: LZW depends on text structure, target language characteristics, and pattern repetition rate. (b) Text length [42], [45]: LZW is not ideal for compressing short texts because it limits repetition. (c) Dictionary size [36], [63], [64]: The dictionary size affects LZW's performance by balancing the compression ratio and processing time.…”
Section: B Lzw Techniquementioning
confidence: 99%
See 1 more Smart Citation
“…(a) Structure of text [21], [29], [62]: LZW depends on text structure, target language characteristics, and pattern repetition rate. (b) Text length [42], [45]: LZW is not ideal for compressing short texts because it limits repetition. (c) Dictionary size [36], [63], [64]: The dictionary size affects LZW's performance by balancing the compression ratio and processing time.…”
Section: B Lzw Techniquementioning
confidence: 99%
“…Some studies have modified this technique's working principle and ignored the language's characteristics so that it can be applied to texts in different languages [31], [41] (as shown in Table 1). In contrast, other studies take advantage of the inherent features of a particular language to compress text, by modifying or processing the text to fit the underlying principles of the technique [22], [25], [42], [43], [44], [45], [46], [47], [48], [49] (as demonstrated in Table 2 for a specific language). This study combines the two methods to reach the maximum possible compression ratio, as it is based on denaturing the data and reducing its actual size before compression, to make it more compressible.…”
Section: Introductionmentioning
confidence: 99%