Ethnogenesis of Kazakhs took place in Central Asia, a region of high genetic and cultural diversity. Even though archaeological and historical studies have shed some light on the formation of modern Kazakhs, the process of establishment of hierarchical socioeconomic structure in the Steppe remains contentious. In this study, we analyzed haplotype variation at 15 Y-chromosomal short-tandem-repeats obtained from 1171 individuals from 24 tribes representing the three socio-territorial subdivisions (Senior, Middle and Junior zhuz) in Kazakhstan to comprehensively characterize the patrilineal genetic architecture of the Kazakh Steppe. In total, 577 distinct haplotypes were identified belonging to one of 20 haplogroups; 16 predominant haplogroups were confirmed by SNP-genotyping. The haplogroup distribution was skewed towards C2-M217, present in all tribes at a global frequency of 51.9%. Despite signatures of spatial differences in haplotype frequencies, a Mantel test failed to detect a statistically significant correlation between genetic and geographic distance between individuals. An analysis of molecular variance found that ∼8.9% of the genetic variance among individuals was attributable to differences among zhuzes and ∼20% to differences among tribes within zhuzes. The STRUCTURE analysis of the 1164 individuals indicated the presence of 20 ancestral groups and a complex three-subclade organization of the C2-M217 haplogroup in Kazakhs, a result supported by the multidimensional scaling analysis. Additionally, while the majority of the haplotypes and tribes overlapped, a distinct cluster of the O2 haplogroup, mostly of the Naiman tribe, was observed. Thus, firstly, our analysis indicated that the majority of Kazakh tribes share deep heterogeneous patrilineal ancestries, while a smaller fraction of them are descendants of a founder paternal ancestor. Secondly, we observed a high frequency of the C2-M217 haplogroups along the southern border of Kazakhstan, broadly corresponding to both the path of the Mongolian invasion and the ancient Silk Road. Interestingly, we detected three subclades of the C2-M217 haplogroup that broadly exhibits zhuz-specific clustering. Further study of Kazakh haplotypes variation within a Central Asian context is required to untwist this complex process of ethnogenesis.
GWAS have identified thousands of loci associated with human complex diseases and traits. How these loci are distributed through the genome has not been systematically evaluated. We hypothesised that the location of GWAS loci differ between ancestral linkage groups (ALGs) related to the paralogy and function of genes. We used data from the NHGRI-EBI GWAS catalog to determine whether the density of GWAS loci relative to HapMap variants in each ALG differed, and whether ALG’s were enriched for experimental factor ontological (EFO) terms assigned to the GWAS traits. In a gene-level analyses we explored the characteristics of genes linked to GWAS loci and those mapping to the ALG’s. We find that GWAS loci were enriched or deficient in 9 and 7 of the 17 ALG’s respectively, while there was no difference in the number of GWAS loci in regions of the human genome unassigned to an ALG. All but 2 ALG’s were significantly enriched or deficient for one or more EFO terms. Lastly, we find that genes assigned to an ALG are under higher levels of selective constraint, have longer coding sequences and higher median expression in the tissue of highest expression than genes not mapping to an ALG. On the other hand, genes associated with GWAS loci have longer genomic length and exhibit higher levels of selective constraint relative to non-GWAS genes.Collectively, this suggests that understanding the location and ancestral origins of GWAS signals may be informative for the development of tools for variant prioritization and interpretation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.