2015
DOI: 10.1080/09296174.2015.1106268
|View full text |Cite
|
Sign up to set email alerts
|

The Relaxed Hilberg Conjecture: A Review and New Experimental Support

Abstract: The relaxed Hilberg conjecture states that the mutual information between two adjacent blocks of text in natural language grows as a power of the block length. The present paper reviews recent results concerning this conjecture. First, the relaxed Hilberg conjecture occurs when the texts repeatedly describe a random reality and Herdan's law for facts repeatedly described in the texts is obeyed. Second, the relaxed Hilberg conjecture implies Herdan's law for set phrases, which can be associated with the better … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
13
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(14 citation statements)
references
References 29 publications
1
13
0
Order By: Relevance
“…In simple words, whereas entropy rate measures how hard it is to predict the text, exponent β measures how hard it is to learn to predict the text. Whereas the entropy rate strongly depends on the kind of the script, the exponent β turned out to be approximately constant, β ≈ 0.884, across six languages, as supposed in [9,11,12,33]. Thus we suppose that the exponent β is a language universal and it characterizes the general complexity of learning of natural language, all languages being equally hard to learn in spite of apparent differences.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…In simple words, whereas entropy rate measures how hard it is to predict the text, exponent β measures how hard it is to learn to predict the text. Whereas the entropy rate strongly depends on the kind of the script, the exponent β turned out to be approximately constant, β ≈ 0.884, across six languages, as supposed in [9,11,12,33]. Thus we suppose that the exponent β is a language universal and it characterizes the general complexity of learning of natural language, all languages being equally hard to learn in spite of apparent differences.…”
Section: Discussionmentioning
confidence: 99%
“…As implicitly or explicitly supposed in [9,11,12,33], the β exponents could be some language universals, which is tantamount to saying that all human languages are equally hard to learn. Universality of exponent β ≈ 0.9 on much smaller data sets for the English, German, and French languages using ansatz f 1 (n) has been previously reported in paper [33] in case of the Lempel-Ziv code rather than the PPM code. Our experimental data further corroborate universality of β, across a larger set of languages and a different universal code.…”
Section: Universality Of the Estimates Of Exponent βmentioning
confidence: 99%
See 1 more Smart Citation
“…[ 25 ], a completely formal proof of the theorem about facts and words for strictly minimal grammar-based codes [ 23 , 26 ] was provided. The respective related theory of natural language was later reviewed in [ 27 , 28 ] and supplemented by a discussion of Santa Fe processes in [ 29 ]. A drawback of this theory at that time was that strictly minimal grammar-based codes used in the statement of the theorem about facts and words are not computable in a polynomial time [ 26 ].…”
Section: Introductionmentioning
confidence: 99%
“…Our work establishes a general link between syntactic structure and the statistical properties of texts, joining other work which has established connections between grammatical rules and informationtheoretic statistics (Dębowski, 2015). We believe the HDMI Hypothesis can form the basis for improved grammar induction algorithms, by providing a new perspective on the head-outward generative models that have formed the basis of most work in that area.…”
Section: Resultsmentioning
confidence: 67%