2012
DOI: 10.1038/srep00943
|View full text |Cite
|
Sign up to set email alerts
|

Languages cool as they expand: Allometric scaling and the decreasing need for new words

Abstract: We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to de… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

18
221
3
2

Year Published

2014
2014
2020
2020

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 196 publications
(244 citation statements)
references
References 57 publications
(101 reference statements)
18
221
3
2
Order By: Relevance
“…The multi-billion-word Google Books Corpus, which spans approximately 200 years of fiction writing, has also been used recently for research on lexical change (e.g. Petersen et al 2012aPetersen et al , 2012b.…”
Section: Na Ly Z I N G L E X I C a L E M E R G E N C E I N M O D E R mentioning
confidence: 99%
“…The multi-billion-word Google Books Corpus, which spans approximately 200 years of fiction writing, has also been used recently for research on lexical change (e.g. Petersen et al 2012aPetersen et al , 2012b.…”
Section: Na Ly Z I N G L E X I C a L E M E R G E N C E I N M O D E R mentioning
confidence: 99%
“…This is not the first paper to draw on the Google Books N-Gram Corpus. Since the seminal publication introducing this resource as well as the new concept of 'culturomics' [13], papers have been published that have analysed patterns of evolution of the usage of different words from a variety of perspectives, such as random fractal theory [14], Zipf's and Heaps' laws and their generalization to two-scaling regimes [15], the evolution of self-organization of word frequencies over time in written English [16], statistical properties of word growth [17,18], socio-historical determinants of word length dynamics [19]; and other studies have been more specifically aimed at issues such as the sociology of changing values resulting from urbanization [20] or changing concepts of happiness [21].…”
Section: Introductionmentioning
confidence: 99%
“…This quantity gives the logarithmic rank variation per year, and is closely related to the log frequency return used by Petersen et al (2012a). For any given word i, ρ i (t) represents a time series quantifying changes in relative word prevalence.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, for the first time it was possible to study quantitatively aspects of cultural change as reflected in language (Michel et al, 2011;Greenfield, 2013), and rigorously assess overall vocabulary drift over the time span of two centuries (Bochkarev et al, 2014). Moreover, methods inspired in statistical mechanics of complex systems were used to study the dynamics of word birth and death (Petersen et al, 2012a), long-range fractal correlations in word frequencies over centuries (Gao et al, 2012), and the scaling behaviour of word frequencies over time as represented by Zipf's (1949) and Heaps' (1978) laws (Petersen et al, 2012b;Gerlach and Altmann, 2013).…”
Section: Introductionmentioning
confidence: 99%