Background The demographic history of South and Southeast Asia (S&SEA) is complex and contentious, with multiple waves of human migration. Some of the earliest footfalls were of the ancestors of modern Austroasiatic (AA) language speakers. Understanding the history of the AA language family, comprising of over 150 languages and their speakers distributed across broad geographical region in isolated small populations of various sizes, can help shed light on the peopling of S&SEA. Here we investigated the genetic relatedness of two AA groups, their relationship with other ethno-linguistically distinct populations, and the relationship of these groups with ancient genomes of individuals living in S&SEA at different time periods, to infer about the demographic history of this region. Results We analyzed 1451 extant genomes, 189 AAs from India and Malaysia, and 43 ancient genomes from S&SEA. Population structure analysis reveals neither language nor geography appropriately correlates with genetic diversity. The inconsistency between “language and genetics” or “geography and genetics” can largely be attributed to ancient admixture with East Asian populations. We estimated a pre-Neolithic origin of AA language speakers, with shared ancestry between Indian and Malaysian populations until about 470 generations ago, contesting the existing model of Neolithic expansion of the AA culture. We observed a spatio-temporal transition in the genetic ancestry of SEA with genetic contribution from East Asia significantly increasing in the post-Neolithic period. Conclusion Our study shows that contrary to assumptions in many previous studies and despite having linguistic commonality, Indian AAs have a distinct genomic structure compared to Malaysian AAs. This linguistic-genetic discordance is reflective of the complex history of population migration and admixture shaping the genomic landscape of S&SEA. We postulate that pre-Neolithic ancestors of today’s AAs were widespread in S&SEA, and the fragmentation and dissipation of the population have largely been a resultant of multiple migrations of East Asian farmers during the Neolithic period. It also highlights the resilience of AAs in continuing to speak their language in spite of checkered population distribution and possible dominance from other linguistic groups.
India and Southeast Asia are home to diverse linguistic groups; the Austroasiatic language group being one of them. The Austroasiatic speakers live in scattered settlements in these regions. What led to such dispersed distribution over this vast geographical space is yet to be resolved. Our work is aimed at reconstructing the migration route of early Austroasiatic settlers and examines their relationship with other linguistic groups. We genotyped 511 unrelated individuals from India and Malaysia out of which 189 were Austroasiatic. The rest belonged to Indo-European, Dravidian, Tibeto-Burman and Austronesian language families. Jarawa and Onge populations from Andaman and Nicobar Islands were also included. Our genotype data was combined with that of 940 individuals from HGDP dataset. We analyzed nearly 0.3 million autosomal SNPs and found that allele frequency correlation between Malaysian Austroasiatics and Indian Tibeto-Burmans was slightly higher (R 2 = 0.77) than with Indian Austroasiatics (R 2 = 0.72). Principal Components Analysis revealed that Malaysian Austroasiatic clustered closer to Tibeto-Burman than to Indian Austroasiatic. Similar clustering pattern was obtained by fineSTRUCTURE cluster dendrogram. The ADMIXTURE analysis inferred genetic component that is modal to the Malaysian Austroasiatic, is also significantly higher amongst Tibeto-Burman than Indian Austroasiatic (P < 2.117e-10), indicating that genetic distance correlates better with geography than language. Studying segments which were Identity by descent between individuals belonging to two different linguistic groups; i.e. Austroasiatic and Tibeto-Burman, we found Tibeto-Burman sharing larger number of segments with Malaysian Austroasiatic, but overall smaller in size. On the other hand the segments shared between the two Austroasiatic populations (India and Malaysia) are comparatively larger in size (P= 0.034) but smaller in number. Our analyses indicate that Malaysian Austroasiatic and Tibeto-Burman initially split from a common ancestor. Then a small group of individuals separated from Malaysian Austroasiatic giving rise to the present day Indian Austroasiatic. Treemix and D-statistics analysis provided evidence for gene flow between Malaysian Austroasiatic and Tibeto-Burman post split. Meanwhile, the southward migration of East Asians resulted in an extensive genetic exchange between East Asians and Tibeto-Burman as was evident in our ADMIXTURE analysis. This subsequent genetic exchange might have shaped the present day language structure.
NorthEast India, with its unique geographic location in the midst of the Himalayas and Bay of Bengal, has served as a passage for the movement of modern humans across the Indian subcontinent and East/Southeast Asia. In this study we look into the population genetics of a unique population called the Khasi, speaking a language (also known as the Khasi language) belonging to the Austroasiatic language family and residing amidst the Tibeto-Burman speakers as an isolated population. The Khasi language belongs to one of the three major broad classifications or phyla of the Austroasiatic language and the speakers of the three sub-groups are separated from each other by large geographical distances. The Khasi speakers are separated from their nearest Austroasiatic language-speaking sub-groups: the “Mundari” sub-family from East and peninsular India and the “Mon-Khmers” in Mainland Southeast Asia. We found the Khasi population to be genetically distinct from other Austroasiatic speakers, i.e. Mundaris and Mon-Khmers, but relatively similar to the geographically proximal Tibeto Burmans. The possible reasons for this genetic-linguistic discordance lie in the admixture history of different migration events that originated from East Asia and proceeded possibly towards Southeast Asia. We found at least two distinct migration events from East Asia. While the ancestors of today’s Tibeto-Burman speakers were affected by both, the ancestors of Khasis were insulated from the second migration event. Correlating the linguistic similarity of Tibeto-Burman and Sino-Tibetan languages of today’s East Asians, we infer that the second wave of migration resulted in a linguistic transition while the Khasis could preserve their linguistic identity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.