2014
DOI: 10.3758/s13423-014-0585-6
|View full text |Cite
|
Sign up to set email alerts
|

Zipf’s word frequency law in natural language: A critical review and future directions

Abstract: The frequency distribution of words has been a key object of study in statistical linguistics for the past 70 years. This distribution approximately follows a simple mathematical form known as Zipf ’ s law. This article first shows that human language has a highly complex, reliable structure in the frequency distribution over and above this classic law, although prior data visualization methods have obscured this fact. A number of empirical phenomena related to word frequencies are then reviewed. These facts a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

19
409
0
15

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 532 publications
(443 citation statements)
references
References 106 publications
(124 reference statements)
19
409
0
15
Order By: Relevance
“…Next, languages were labeled in classes according to the linguistic family to which they belong (Romance, Germanic, Slavic, Uralic). The eight dimensional vectors comprising the eight ApEn (pattern lengths [3][4][5][6][7][8][9][10] values are used to create the two-dimensional projection. We observe that the families are segregated.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Next, languages were labeled in classes according to the linguistic family to which they belong (Romance, Germanic, Slavic, Uralic). The eight dimensional vectors comprising the eight ApEn (pattern lengths [3][4][5][6][7][8][9][10] values are used to create the two-dimensional projection. We observe that the families are segregated.…”
Section: Resultsmentioning
confidence: 99%
“…Two representative findings of universal features of natural language are the Zipf and Heaps laws, which are based on word frequency and number of different words, respectively [1][2][3][4]. From a more basic perspective, human language can also be considered as a sequence of symbols which contains information encoded in the patterns (words) needed to communicate.…”
Section: Introductionmentioning
confidence: 99%
“…In [1] an attempt is made to justify Zipf's law, reliance on the features of human memory. This rationale is useful for understanding the hidden features that are universally manifested in virtually all large enough texts.…”
Section: Related Workmentioning
confidence: 99%
“…Rank frequency distributions are found in contemporary natural language corpora and Swadesh lists [19][20][21], comparisons across multiple languages [22][23][24][25], in both written and spoken language data [26], across all English literary texts included in Project Gutenberg [27], and historic language data that is not yet translated [28], but, importantly, are not found in random monkey-typing corpora [14,29]. Rank frequency research has expanded beyond a narrow focus on adult, monolingual, native speakers to demonstrate distinct rank frequency distributions for corpora of varying levels of L2 proficiency across users of natural language [30,31] and artificial command languages [32], L1 attritors who have lost proficiency in their L1 over their lifespan [31], different language combinations of spontaneous codeswitching [33], and in languages with varying proportions of non-native speakers [34].…”
Section: Introductionmentioning
confidence: 99%