The search for ever deeper relationships among the World's languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some "ultraconserved" words exist that might be used to find evidence for deep linguistic relationships beyond that time barrier. Here we use a statistical model, which takes into account the frequency with which words are used in common everyday speech, to predict the existence of a set of such highly conserved words among seven language families of Eurasia postulated to form a linguistic superfamily that evolved from a common ancestor around 15,000 y ago. We derive a dated phylogenetic tree of this proposed superfamily with a time-depth of ∼14,450 y, implying that some frequently used words have been retained in related forms since the end of the last ice age. Words used more than once per 1,000 in everyday speech were 7-to 10-times more likely to show deep ancestry on this tree. Our results suggest a remarkable fidelity in the transmission of some words and give theoretical justification to the search for features of language that might be preserved across wide spans of time and geography. cultural evolution | phylogeny | historical linguistics T he English word brother and the French frère are related to the Sanskrit bhratr and the Latin frater, suggesting that words as mere sounds can remain associated with the same meaning for millennia. But how far back in time can traces of a word's genealogical history persist, and can we predict which words are likely to show deep ancestry?These questions are central to understanding language evolution and to efforts to identify linguistic superfamilies uniting the world's languages (1-5). Evidence for proposed superfamiliessuch as Amerind (6), linking most of the language families of the New World, and Nostratic (7-9) and Eurasiatic (3, 4, 10), linking the major language families of Eurasia-is often based on the identification of putative "cognate" words (analogous to homology in biology), the sound and meaning correspondences of which are thought to indicate that they derive from common ancestral words.
We propose, and provide corpus-based support for, a usage-based explanation for cross-linguistic trends in the coding of causal–noncausal verb pairs, such as raise/rise, break (tr.)/break (intr.). While English mostly uses the same verb form both for the causal and the noncausal sense (labile coding), most languages have extra coding for the causal verb (causative coding) and/or for the noncausal verb (anticausative coding). Causative and anticausative coding is not randomly distributed (Haspelmath 1993): Some verb meanings, such as ‘freeze’, ‘dry’ and ‘melt’, tend to be coded as causatives, while others, such as ‘break’, ‘open’ and ‘split’, tend to be coded as anticausatives. We propose an explanation of these coding tendencies on the basis of the form–frequency correspondence principle, which is a general efficiency principle that is responsible for many grammatical asymmetries, ultimately grounded in predictability of frequently expressed meanings. In corpus data from seven languages, we find that verb pairs for which the noncausal member is more frequent tend to be coded as anticausatives, while verb pairs for which the causal member is more frequent tend to be coded as causatives. Our approach implies that linguists should not rely on form–meaning parallelism when trying to explain cross-linguistic or language-particular patterns in this domain.
We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the world's 7000 languages. Our data were collected from linguistic corpora that record frequencies of use for the 200 meanings in the widely used Swadesh fundamental vocabulary. Our interest is to assess evidence for shared patterns of language use around the world, and for the relationship of language use to rates of lexical replacement, defined as the replacement of a word by a new unrelated or non-cognate word. Frequencies of use for words in the Swadesh list range from just a few per million words of speech to 191 000 or more. The average inter-correlation among languages in the frequency of use across the 200 words is 0.73 (p , 0.0001). The first principal component of these data accounts for 70 per cent of the variance in frequency of use. Elsewhere, we have shown that frequently used words in the Indo-European languages tend to be more conserved, and that this relationship holds separately for different parts of speech. A regression model combining the principal factor loadings derived from the worldwide sample along with their part of speech predicts 46 per cent of the variance in the rates of lexical replacement in the Indo-European languages. This suggests that Indo-European lexical replacement rates might be broadly representative of worldwide rates of change. Evidence for this speculation comes from using the same factor loadings and part-of-speech categories to predict a word's position in a list of 110 words ranked from slowest to most rapidly evolving among 14 of the world's language families. This regression model accounts for 30 per cent of the variance. Our results point to a remarkable regularity in the way that human speakers use language, and hint that the words for a shared set of meanings have been slowly evolving and others more rapidly evolving throughout human history.
Loanword use has dominated the literature on language contact and its salient nature continues to draw interest from linguists and non-linguists. Traditionally, loanwords were investigated by means of raw frequencies, which are at best uninformative and at worst misleading. Following a new wave of studies which look at loans from a quantitatively more informed standpoint, modelling "success" by taking into account frequency of the counterparts available in the language adopting the loanwords, we propose a similar model of loan-use and demonstrate its benefits in a case study of loanwords from Māori into (New Zealand) English. Our model contributes to previous work in this area by combining both the success measure mentioned above with a rich range of linguistic characteristics of the loanwords (such as loan length and word class), as well as a similarly detailed group of sociolinguistic characteristics of the speakers using them (gender, age and ethnicity of both, speakers and addresses). Our model is unique in bringing together of all these factors at the same time. The findings presented here illustrate the benefit of a quantitatively balanced approach to modelling loanword use. Furthermore, they illustrate the complex interaction between linguistic and sociolinguistic factors in such language contact scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.