2023
DOI: 10.1038/s41598-023-42327-3
|View full text |Cite
|
Sign up to set email alerts
|

A large quantitative analysis of written language challenges the idea that all languages are equally complex

Alexander Koplenig,
Sascha Wolfer,
Peter Meyer

Abstract: One of the fundamental questions about human language is whether all languages are equally complex. Here, we approach this question from an information-theoretic perspective. We present a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6500 different documents as represented in 41 multilingual text collections consisting of ~ 3.5 billion words or ~ 9.0 billion characters and covering 2069 different languages that are spoken as a native language b… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
20
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(20 citation statements)
references
References 94 publications
0
20
0
Order By: Relevance
“…Yet, depending on how you count, there are between 6000 and 8000 different languages and language varieties on the planet 13 – 15 that vary widely in their structural properties 16 , 17 . A growing body of cross-linguistic research has begun to document that the natural and social environments in which languages are being used and learned drive this diversity 18 21 , that language structure is influenced by socio-demographic factors such as the estimated number of speakers 18 , 21 23 and that the long-held belief in a principle of "invariance of language complexity" 24 may be incorrect 25 .…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…Yet, depending on how you count, there are between 6000 and 8000 different languages and language varieties on the planet 13 – 15 that vary widely in their structural properties 16 , 17 . A growing body of cross-linguistic research has begun to document that the natural and social environments in which languages are being used and learned drive this diversity 18 21 , that language structure is influenced by socio-demographic factors such as the estimated number of speakers 18 , 21 23 and that the long-held belief in a principle of "invariance of language complexity" 24 may be incorrect 25 .…”
Section: Introductionmentioning
confidence: 99%
“…faster) to learn: in a large-scale quantitative cross-linguistic analysis, Ref. 25 trained an LM on more than 6500 documents in over 2000 different languages and statistically inferred the entropy rate of each document, which can be seen as an index of the underlying language complexity 28 , 33 35 . The results showed that documents in languages with more speakers tended to be more complex.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations