When speakers of different languages interact, they are likely to influence each other: contact leaves traces in the linguistic record, which in turn can reveal geographical areas of past human interaction and migration. However, other factors may contribute to similarities between languages. Inheritance from a shared ancestral language and universal preference for a linguistic property may both overshadow contact signals. How can we find geographical contact areas in language data, while accounting for the confounding effects of inheritance and universal preference? We present sBayes , an algorithm for Bayesian clustering in the presence of confounding effects. The algorithm learns which similarities are better explained by confounders, and which are due to contact effects. Contact areas are free to take any shape or size, but an explicit geographical prior ensures their spatial coherence. We test sBayes on simulated data and apply it in two case studies to reveal language contact in South America and the Balkans. Our results are supported by findings from previous studies. While we focus on detecting language contact, the method can also be used to uncover other traces of shared history in cultural evolution, and more generally, to reveal latent spatial clusters in the presence of confounders.
Phylogenetic trees are a central tool for studying language evolution and have wide implications for understanding cultural evolution as a whole. For example, they have been the basis of studies on the evolution of musical instruments, religious beliefs and political complexity. Bayesian phylogenetic methods are transparent regarding the data and assumptions underlying the inference. One of these assumptions—that languages change independently—is incompatible with the reality of language evolution, particularly with language contact. When speakers interact, languages frequently borrow linguistic traits from each other. Phylogenetic methods ignore this issue, which can lead to errors in the reconstruction. More importantly, they neglect the rich history of language contact. A principled way of integrating language contact in phylogenetic methods is sorely missing. We present , a Bayesian phylogenetic model with horizontal transfer for language evolution. The model efficiently infers the phylogenetic tree of a language family and contact events between its clades. The implementation is available as a package for the phylogenetics software BEAST 2. We apply in a simulation study and a case study on a subset of well-documented Indo-European languages. The simulation study demonstrates that correctly reconstructs the history of a simulated language family, including simulated contact events. Moreover, it shows that ignoring contact can lead to systematic errors in the estimated tree height, rate of change and tree topology, which can be avoided with . The case study confirms that reconstructs known contact events in the history of Indo-European and finds known loanwords, demonstrating its practical potential. The model has a higher statistical fit to the data than a conventional phylogenetic reconstruction, and the reconstructed tree height is significantly closer to well-attested estimates. Our method closes a long-standing gap between the theoretical and empirical models of cultural evolution. The implications are especially relevant for less documented language families, where our knowledge of past contacts and linguistic borrowings is limited. Since linguistic phylogenies have become the backbone of many studies of cultural evolution, the addition of this integral piece of the puzzle is crucial in the endeavour to understand the history of human culture.
Bayesian phylogeography has been used in historical linguistics to reconstruct homelands and expansions of language families, but the reliability of these reconstructions has remained unclear. We contribute to this discussion with a simulation study where we distinguish two types of spatial processes: migration , where populations or languages leave one place for another, and expansion , where populations or languages gradually expand their territory. We simulate migration and expansion in two scenarios with varying degrees of spatial directional trends and evaluate the performance of state-of-the-art phylogeographic methods. Our results show that these methods fail to reconstruct migrations, but work surprisingly well on expansions, even under severe directional trends. We demonstrate that migrations and expansions have typical phylogenetic and spatial patterns, which in the one case inhibit and in the other facilitate phylogeographic reconstruction. Furthermore, we propose descriptive statistics to identify whether a real sample of languages, their relationship and spatial distribution, better fits a migration or an expansion scenario. Bringing together the results of the simulation study and theoretical arguments, we make recommendations for assessing the adequacy of phylogeographic models to reconstruct the spatial evolution of languages.
Bayesian phylogeography aims to reconstruct migrations in evolutionary processes. This methodological framework has been used for the reconstruction of homelands and historical expansions of various language families, but its reliability for language diversification research has remained unclear. We contribute to this discussion with a simulation study where we distinguish two types of spatial processes: migration and expansion. By migration we denote long-distance movement of whole populations, leaving their previous habitat empty. Expansions are small-scale movements of speakers or inclusions of new speakers into the language community, cumulatively contributing to a gradual spread into new territories. We simulate migrations, in the form of directional random walks, and expansions, in the form of a grid-based region-growing process. We run both simulation scenarios with varying degrees of directional trends and evaluate the performance of state-of-the-art phylogeographic methods. Our results show that phylogeography fails to reconstruct migrations, but works surprisingly well on expansions, even under severe directional trends. We demonstrate that migrations and expansions have typical phylogenetic and spatial patterns, which in the one case inhibit and in the other facilitate phylogeographic reconstruction. Furthermore, we propose descriptive statistics to identify whether a real sample of languages (Bantu), their relationship and spatial distribution, better fits a migration or an expansion scenario. Bringing together the results of the simulation study and theoretical arguments, we make recommendations for judging the adequacy of phylogeographic models to reconstruct the spatial evolution of languages.
When speakers of two or more languages interact, they are likely to influence each other: contact leaves traces in the linguistic record, which in turn can reveal geographic areas of past human interaction and migration. However the complex, multi-dimensional nature of contact has hindered the development of a rigorous methodology for detecting its traces. Specifically, other factors may contribute to similarities between languages. Inheritance (a property is passed from an ancestor to several descendant languages), and universal preference (a property is universally preferred), may both overshadow contact signals. How can we find geographic contact areas in language data, while accounting for the confounding effects of inheritance and universal preference? We present sBayes, an algorithm for Bayesian clustering in the presence of confounding effects. The algorithm learns which similarities in a set of features are better accounted for by confounders, and which are due to contact effects. Contact areas are free to take any shape or size, but an explicit geographic prior ensures their spatial coherence. We test the clustering method on simulated data and apply it in two case studies to reveal language contact in South America and the Balkans. Our results are supported by ---mostly qualitative--- findings from previous studies. While we focus on the specific problem of language contact, the method can also be used to uncover other traces of shared history in cultural evolution, and more generally, to reveal latent spatial clusters in the presence of confounders.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.