Making Compression Algorithms for Unicode Text

Gleave, Adam; Steinruecken, Christian

doi:10.1109/dcc.2017.58

Cited by 7 publications

(4 citation statements)

References 6 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this method of compression, the initial part is to read data from the input stream present in the form of UTF-8 characters. The only known work which reads data in the form of UTF-8 characters belongs to Gleave et al (2017). In their work, they had investigated the effectiveness of different token distributions while being used as a base distribution for LZW.…”

Section: Methodsmentioning

confidence: 99%

“…Barua et al (2017) projected an enhanced LZW compression technique for Bangla dialect considering the unique features of that language. Gleave et al (2017) represented modified techniques with escaping on LZW and PPM (Prediction with Partial Matching). An abridged bit representation in the dictionary is an indicative for each Unicode character.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Compression System for Unicode Files Using an Enhanced Lzw Method

Anto¹,

Ramachandran²

2020

JST

View full text Add to dashboard Cite

Data compression plays a vital and pivotal role in the process of computing as it helps in space reduction occupied by a file as well as to reduce the time taken to access the file. This work relates to a method for compressing and decompressing a UTF-8 encoded stream of data pertaining to Lempel-Ziv-welch (LZW) method. It is worth to use an exclusive-purpose LZW compression scheme as many applications are utilizing Unicode text. The system of the present work comprises a compression module, configured to compress the Unicode data by creating the dictionary entries in Unicode format. This is accomplished with adaptive characteristic data compression tables built upon the data to be compressed reflecting the characteristics of the most recent input data. The decompression module is configured to decompress the compressed file with the help of unique Unicode character table obtained from the compression module and the encoded output. We can have remarkable gain in compression, wherein the knowledge that we gather from the source is used to explore the decompression process.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Compression System for Unicode Files Using an Enhanced Lzw Method

Anto¹,

Ramachandran²

2020

JST

View full text Add to dashboard Cite

show abstract

“…It promises ratio improvements of around 16% over state of the art compression tools. Text compression beyond ASCII, applicable to the human-readable log messages, has been explored by modifications to existing bytelevel compressors such as bzip2, with significant effectiveness improvements reported [10], and semantic compression for text has been investigated as well [11].…”

Section: Related Workmentioning

confidence: 99%

Comparison and Model of Compression Techniques for Smart Cloud Log File Handling

Spillner

2020

2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI)

View full text Add to dashboard Cite

Compression as data coding technique has seen approximately 70 years of research and practical innovation. Nowadays, powerful compression tools with good trade-offs exist for a range of file formats from plain text to rich multimedia. Yet in the dilemma of cloud providers to reduce log data sizes as much as possible while having to keep as much as possible around for regulatory reasons and compliance processes, many companies are looking for smarter solutions beyond brute compression. In this paper, comprehensive applied research setting around network and system logs is introduced by comparing text compression ratios and performance. The benchmark encompasses 13 tools and 30 tool-configuration-search combinations. The tool and algorithm relationships as well as benchmark results are modelled in a graph. After discussing the results, the paper reasons about limitations of individual approaches and suitable combinations of compression with smart adaptive log file handling. The adaptivity is based on the exploitation of knowledge on format-specific compression characteristics expressed in the graph, for which a proof-of-concept advisor service is provided.

show abstract

“…Linkon et al projected a changed LZW dictionary based index compression technique for Bangle dialect in [5]. Gleave et al represent a new modified technique of byte-oriented compressors to work straight on Unicode characters [6]. In [8], the system is configured to maintain a set of character tables and a cluster table in memory .…”

Section: Related Workmentioning

confidence: 99%

Preprocessed Text Compression Method for Malayalam Text Files

A¹,

R²

2019

IJRTE

View full text Add to dashboard Cite

The increasing importance of Unicode for text files implies an increase in storage space required for data and the time for the transmission of data, with a corresponding need for compression of data. Conventional compressors fair purely on UTF-8 texts, where each character can span multiple bytes. Malayalam which is one among the four major languages of the Dravidian family, is represented by using Unicode characters. The contribution of this paper is a reversible transformation mapping of the input to reduce the actual size of the input file before a general purpose compression method. After the preprocessing, LZW compression achieves more compression to Malayalam text files containing any characters including ASCII characters. This method can be extended to any native language files containing mostly the characters of only one script.

show abstract

Making Compression Algorithms for Unicode Text

Cited by 7 publications

References 6 publications

A Compression System for Unicode Files Using an Enhanced Lzw Method

A Compression System for Unicode Files Using an Enhanced Lzw Method

Comparison and Model of Compression Techniques for Smart Cloud Log File Handling

Preprocessed Text Compression Method for Malayalam Text Files

Contact Info

Product

Resources

About