Natural Language Compression on Edge-Guided text preprocessing

Martínez‐Prieto, Miguel A.; Adiego, Joaquín; Fuente, Pablo de la

doi:10.1016/j.ins.2011.07.039

Cited by 8 publications

(8 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It retains the original Huffman algorithm features, but process the text as a sequence of words and replaces the original words with variable-bit code-words to represent them. This word-based Huffman largely outperforms the traditional character-based one by achieving near about 25% compression ratio instead of 65%, achieved by the character-based Huffman [12].…”

Section: Natural Language Text Modeling and Compressionmentioning

confidence: 91%

“…LZ is a well known family of compression algorithms as these algorithms are capable of deriving more compressed text file using limited resources such as memory and also with reasonably good speed of compression and decompression. Different popular compressors are designed on the LZ platform such as gzip and p7zip [12].…”

Section: Dictionary-based Compressionmentioning

confidence: 99%

“…punctuation, space or other spatial characters). It was found that about 70%-80% of the separators are single spaces [12].…”

Section: Natural Language Text Modeling and Compressionmentioning

confidence: 99%

“…In natural language texts the word distributions are much skewed: there are few words that have extreme frequencies and many words which have low frequencies. This feature is approximated in Zipf 's Law [12]. Zipf's Law attempts to capture the distribution of word frequencies in the text.…”

Section: Natural Language Text Modeling and Compressionmentioning

confidence: 99%

See 3 more Smart Citations

A Graph Theoretical Preprocessing Step for Text Compression

Phukon¹,

Baruah²

2015

IJMUE

View full text Add to dashboard Cite

This paper presents CSGM 2 , a text preprocessing technique for compression purposes. It converts the original text into a word net (graph representation) and can retain the detailed contextual information such as word proximity. Specific directed graph is proposed to model this word net where words are stored in vertices and edges represent word transitions. The word net is fully capable of holding the natural word order in the original text and hence can be used directly for encoding purposes.

show abstract

Section: Natural Language Text Modeling and Compressionmentioning

confidence: 91%

Section: Dictionary-based Compressionmentioning

confidence: 99%

“…punctuation, space or other spatial characters). It was found that about 70%-80% of the separators are single spaces [12].…”

Section: Natural Language Text Modeling and Compressionmentioning

confidence: 99%

Section: Natural Language Text Modeling and Compressionmentioning

confidence: 99%

See 2 more Smart Citations

A Graph Theoretical Preprocessing Step for Text Compression

Phukon¹,

Baruah²

2015

IJMUE

View full text Add to dashboard Cite

show abstract

“…Several scholar works on the field of focus on the same goal as ours: Preprocess text to help it compress better. One such study applies Edge-guided text compression that is based on graphs, ordered pairs and sets [1] to transform text into a word net; the adjacencies of the word have a direct relationship with the unique graph, which is the result of the word net. Our approach has less complexity as it only involves letter repositioning, rather than complex data structures as graphs.…”

Section: Related Workmentioning

confidence: 99%

A Novel Text Processing for Better Compression and Security in Cloud

Çankaya¹,

Vinayak²

2016

IJCTE

View full text Add to dashboard Cite

We introduce LG-encoding, a novel approach to text encoding that shuffles the position of letters anticipating an improved compression performance. Our technique brings together the repeating letters in a word, so as to inflate redundancy to be exploited by the compression algorithm to follow. The encoding process introduces no significant overhead: It is easily reversible as it only involves repositioning the letters in a text. We experiment LG-encoding on text from 4 different source languages: English, French, German, and Spanish with a set of well-known compression algorithms that follows the encoding: Arithmetic Coding, Huffman Coding, BWT and PPM. Our results yield promising outcomes as we achieve substantially better compression rates for Arithmetic Coding and Huffman Coding that follows LG-encoding. We also propose use of our method in large data repositories, such as cloud, as it also provides significant level of security by shuffling the letters of words in text. Index Terms-Text encoding, lossless text compression.

show abstract

A New Compression Scheme for Secure Transmission

Begum¹,

Venkataramani²

2013

Int. J. Autom. Comput.

View full text Add to dashboard Cite

Natural Language Compression on Edge-Guided text preprocessing

Cited by 8 publications

References 41 publications

A Graph Theoretical Preprocessing Step for Text Compression

A Graph Theoretical Preprocessing Step for Text Compression

A Novel Text Processing for Better Compression and Security in Cloud

A New Compression Scheme for Secure Transmission

Contact Info

Product

Resources

About