2011
DOI: 10.1109/tit.2011.2145170
|View full text |Cite
|
Sign up to set email alerts
|

On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts

Abstract: The article presents a new interpretation for Zipf-Mandelbrot's law in natural language which rests on two areas of information theory. Firstly, we construct a new class of grammar-based codes and, secondly, we investigate properties of strongly nonergodic stationary processes. The motivation for the joint discussion is to prove a proposition with a simple informal statement: If a text of length n describes n β independent facts in a repetitive way then the text contains at least n β / log n different words, u… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
125
0
1

Year Published

2012
2012
2020
2020

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 35 publications
(131 citation statements)
references
References 33 publications
5
125
0
1
Order By: Relevance
“…A few words of clarification are needed here. Roughly speaking, admissibly minimal grammar-based codes are compression algorithms that represent a text as the smallest context-free grammar that generates the text as its sole production (Kieffer and Yang, 2000;Dębowski, 2011b). By the results of Charikar et al (2005), we may suppose that these algorithms are computationally intractable.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…A few words of clarification are needed here. Roughly speaking, admissibly minimal grammar-based codes are compression algorithms that represent a text as the smallest context-free grammar that generates the text as its sole production (Kieffer and Yang, 2000;Dębowski, 2011b). By the results of Charikar et al (2005), we may suppose that these algorithms are computationally intractable.…”
Section: Introductionmentioning
confidence: 99%
“…Thus we may expect that Herdan's law for nonterminal symbols is a certain approximation of Herdan's law for words. Investigating mathematical properties of admissibly minimal grammar-based codes, Dębowski (2006Dębowski ( , 2011b showed that Herdan's law for nonterminal symbols is a consequence of the relaxed Hilberg conjecture. Namely, if an arbitrary stationary stochastic process satisfies the relaxed Hilberg conjecture then texts generated by this process satisfy Herdan's law for nonterminal symbols.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In contrast, by Theorem 7 from Dębowski (2011), the analogous decomposition for the block entropy reads…”
Section: The Proof Of Theoremmentioning
confidence: 99%
“…Li and Vitányi (2008). While the shortest program for generating a string cannot be efficiently found, there exist also computable universal codes such as the Lempel-Ziv code (Ziv and Lempel, 1977) and grammar-based codes (Kieffer and Yang, 2000;Dębowski, 2011). In particular, the excess length E C µ (n) of admissibly minimal grammar-based codes is bounded above by the number of distinct nonterminal symbols in the grammar used for compression (Dębowski, 2011).…”
Section: Introductionmentioning
confidence: 99%