Proceedings DCC '97. Data Compression Conference
DOI: 10.1109/dcc.1997.581953
|View full text |Cite
|
Sign up to set email alerts
|

Models of English text

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(33 citation statements)
references
References 15 publications
0
33
0
Order By: Relevance
“…For encoding the vocabulary output file, standard order 1 byte-based PPM is quite effective. For the symbols output file, where the symbol numbers can get quite large for some languages, a similar technique to word-based PPM [4] works well with the alphabet size being unbounded. Another finding is that an order 4 model works best among the experimented languages.…”
Section: Preprocessing and Postprocessingmentioning
confidence: 99%
See 2 more Smart Citations
“…For encoding the vocabulary output file, standard order 1 byte-based PPM is quite effective. For the symbols output file, where the symbol numbers can get quite large for some languages, a similar technique to word-based PPM [4] works well with the alphabet size being unbounded. Another finding is that an order 4 model works best among the experimented languages.…”
Section: Preprocessing and Postprocessingmentioning
confidence: 99%
“…Variants of the PPM algorithm (such as PPMC and PPMD) are distinguished by the escape mechanism used to backoff to lower order models when new symbols are encountered in the context. PPM has also been applied successfully too many natural language processing (NLP) applications such as cryptology, language identification, and text correction [4], [5].…”
Section: Prediction By Partial Matching (Ppm)mentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, we report in this paper results on the use of PPM on natural language texts as well as results on the Calgary Corpus, a standard corpus used to compare text compression algorithms. PPM has achieved excellent results in various natural language processing applications such as language identification and segmentation, text categorisation, cryptology, and optical character recognition (OCR) [7].…”
Section: Prediction By Partial Matchingmentioning
confidence: 99%
“…Then the probability for all symbols or characters will be estimated and encoded by | | whereA is the size of alphabets in the contexts. The experiments show the maximum order that usually gets good compression rates for English is five [1][8] [7]. For Arabic text, the experiments show that order seven the PPM algorithm gives a good compression rate [9].…”
Section: Prediction By Partial Matchingmentioning
confidence: 99%