2011
DOI: 10.1063/1.3630929
|View full text |Cite
|
Sign up to set email alerts
|

Excess entropy in natural language: Present state and perspectives

Abstract: We review recent progress in understanding the meaning of mutual information in natural language. Let us define words in a text as strings that occur sufficiently often. In a few previous papers, we have shown that a power-law distribution for so defined words (a.k.a. Herdan's law) is obeyed if there is a similar power-law growth of (algorithmic) mutual information between adjacent portions of texts of increasing length. Moreover, the power-law growth of information holds if texts describe a complicated infini… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
23
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 21 publications
(23 citation statements)
references
References 47 publications
0
23
0
Order By: Relevance
“…However, these later results may be also highly interesting for linguists and thus worth popularizing in this venue. A similar review for physicists and researchers working in complex systems was published by Dębowski (2011a).…”
Section: Introductionmentioning
confidence: 99%
“…However, these later results may be also highly interesting for linguists and thus worth popularizing in this venue. A similar review for physicists and researchers working in complex systems was published by Dębowski (2011a).…”
Section: Introductionmentioning
confidence: 99%
“….. It has a long history and is widely employed as a measure of correlation and complexity in a variety of fields, from ergodic theory and dynamical systems to neuroscience and linguistics [1][2][3][4][5][6]. For a review the reader is referred to [7].…”
Section: Introductionmentioning
confidence: 99%
“…Indeed, one can show that any process generated by a stationary, countable-state HMM either has positive entropy rate or consists entirely of periodic sequences, which these do not. Versions of the Santa Fe Process introduced in [6] are finite-alphabet, infinitary processes with positive entropy rate. However, they were not constructed directly as hidden Markov processes, and it seems unlikely that they should have any stationary, countable-state presentations either.…”
Section: Introductionmentioning
confidence: 99%
“…Many natural processes in physics, biology, neuroscience, finance, and quantitative social science are highly non-Markovian with slowly asymptoting or divergent E [40]. This implies rather small spectral gaps if the process has a countable infinity of causal states-e.g., as in Ref.…”
Section: Curse Of Dimensionality In Predictive Rate-distortionmentioning
confidence: 99%