2019
DOI: 10.33048/semi.2019.16.129
|View full text |Cite
|
Sign up to set email alerts
|

A statistical test for the Zipf's law by deviations from the Heaps' law

Abstract: We explore a probabilistic model of an artistic text: words of the text are chosen independently of each other in accordance with a discrete probability distribution on an infinite dictionary. The words are enumerated 1, 2, . . ., and the probability of appearing the i'th word is asymptotically a power function. Bahadur proved that in this case the number of different words depends on the length of the text is asymptotically a power function, too. On the other hand, in the applied statistics community, there e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 4 publications
0
2
0
Order By: Relevance
“…A recent statistical investigation of 100 translations of the Bible into 100 different languages found that the power law exponent value is almost close to one, which is exactly in accordance with Zipf's law [9]. In this study, Bible translations into Tigrigna, Amharic, and English were statistically analyzed for the following reasons: First, to demonstrate that the frequency of words verse rank order follows Zipf-Mandelbrot's distribution, as per [10]. Second, to investigate text homogeneity using the model created by [11] for the difference between forward and backward processes.…”
Section: Introductionmentioning
confidence: 63%
“…A recent statistical investigation of 100 translations of the Bible into 100 different languages found that the power law exponent value is almost close to one, which is exactly in accordance with Zipf's law [9]. In this study, Bible translations into Tigrigna, Amharic, and English were statistically analyzed for the following reasons: First, to demonstrate that the frequency of words verse rank order follows Zipf-Mandelbrot's distribution, as per [10]. Second, to investigate text homogeneity using the model created by [11] for the difference between forward and backward processes.…”
Section: Introductionmentioning
confidence: 63%
“…We estimate the exponent θ ∈ (0, 1) of the power functions in two different ways. Chebunin and Kovalevskii (2019b) proposed estimate θ = log R n − log R [n/2] .or analysi It have been used for analysis of short texts (Zakrevskaya and Kovalevskii, 2019).…”
Section: Empirical Analysismentioning
confidence: 99%