2019
DOI: 10.1162/coli_a_00355
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Computational Language Models with Scaling Properties of Natural Language

Abstract: In this article, we evaluate computational models of natural language with respect to the universal statistical behaviors of natural language. Statistical mechanical analyses have revealed that natural language text is characterized by scaling properties, which quantify the global structure in the vocabulary population and the long memory of a text. We study whether five scaling properties (given by Zipf's law, Heaps' law, Ebeling's method, Taylor's law, and long-range correlation analysis) can serve for evalu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
13
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(15 citation statements)
references
References 35 publications
2
13
0
Order By: Relevance
“…Besides these good BA graph results, however, all random walks on a graph structure learned from Moby Dick (first row of the second block and first two rows of the third block of table 1) produced α ; 0.5. This suggests that linguistic sequences cannot be modeled by Markov models, which confirms both previous mathematical results (Lin and Tegmark 2016) and experimental results (Takahashi and Tanaka-Ishii 2018). The main reason is that the mean degree of the Markov models was large (above 10).…”
Section: Discussionsupporting
confidence: 89%
See 2 more Smart Citations
“…Besides these good BA graph results, however, all random walks on a graph structure learned from Moby Dick (first row of the second block and first two rows of the third block of table 1) produced α ; 0.5. This suggests that linguistic sequences cannot be modeled by Markov models, which confirms both previous mathematical results (Lin and Tegmark 2016) and experimental results (Takahashi and Tanaka-Ishii 2018). The main reason is that the mean degree of the Markov models was large (above 10).…”
Section: Discussionsupporting
confidence: 89%
“…Our group (Takahashi and Tanaka-Ishii 2018) has also shown how all these models except neural models could not produce a Taylor exponent larger than 0.5. For example, figure 5 shows the Taylor analyses of texts generated by a first-order Markov model trained with the real Moby Dick and by the Simon process.…”
Section: Discussionmentioning
confidence: 95%
See 1 more Smart Citation
“…erefore, the task of word segmentation has changed from the use of algorithms to modeling [21]. However, this change results in significant disadvantages, namely [22], (1) A parameter space too large to be practical (2) A sparse data matrix e averaged perceptron (AP) [23] refers to a perceptron that records the cumulative value of the feature weights and uses averaging to obtain the final model [24]. Although the model segmentation performance of the model trained with the original training set is not very high, the improvement is significant after incremental training.…”
Section: Related Workmentioning
confidence: 99%
“…We show that the sensitivity of words to population size is also reflected in their meaning. We also investigate how social media language and city size affects the parameters of Zipf’s Law [40], and how the exponent of Zipf’s Law is different from that of the literature value [40,41]. We also show that the number of new words needed in longer texts (Heaps' Law [2]) exhibits a sublinear power-law form on Twitter, indicating a decelerating growth of distinct tokens with city size.…”
Section: Introductionmentioning
confidence: 99%