2014
DOI: 10.1140/epjb/e2014-40805-2
|View full text |Cite
|
Sign up to set email alerts
|

Rank-frequency relation for Chinese characters

Abstract: We show that the Zipf's law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf's law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Zipfian power-law regime for frequent characters (first layer) with an exponential-like regime for less frequent charac… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
27
1

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(29 citation statements)
references
References 60 publications
1
27
1
Order By: Relevance
“…Recently, Deng et al found that the most used Chinese words R i (w k ) < 80 ''contain most functional characters'', and have different statistics in this ''pre-Zipfian region'' [11].…”
Section: Data and Calculation Resultsmentioning
confidence: 99%
“…Recently, Deng et al found that the most used Chinese words R i (w k ) < 80 ''contain most functional characters'', and have different statistics in this ''pre-Zipfian region'' [11].…”
Section: Data and Calculation Resultsmentioning
confidence: 99%
“…Note, however, that matching regime γ tailed exponents is non-trivial. For example, ranked frequency distributions of Chinese texts may be better fitted by stretched exponential functions than by power laws ( Deng et al, 2014 ). In addition, languages show “size-effects” generally present in the tails of frequency distributions that are dependent on the size of text corpora.…”
Section: Language Laws Are Constrained By the Engineering Of Biological Systemsmentioning
confidence: 99%
“…Given word rank u and frequency F ( u ) for a word of rank u , Zipf's law suggests the following proportionality formula: As shown here for Les Misérables , the plot typically follows formula (3) only approximately. There have been discussions on how to improve the Zipf model by incorporating such bias (Mandelbrot, 1952 , 1965 ; Gerlach and Altmann, 2013 ; Deng et al, 2014 ). To the best of the author's knowledge, however, the question of a mathematical model that fully explains the bias is still under debate.…”
Section: Quantification Of Long-range Correlationmentioning
confidence: 99%