To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely underrepresented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at approximately 160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intricately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (F(ST) = 0.0002 approximately 0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (F(ST) > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psoriasis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p < 10(-101)). These signals indicating significant differences among Han Chinese subpopulations should be carefully explained in case they are also detected in association studies, especially when sample sources are diverse.
Genetic studies of Tibetans, an ethnic group with a long-lasting presence on the Tibetan Plateau which is known as the highest plateau in the world, may offer a unique opportunity to understand the biological adaptations of human beings to high-altitude environments. We conducted a genome-wide study of 1,000,000 genetic variants in 46 Tibetans (TBN) and 92 Han Chinese (HAN) for identifying the signals of high-altitude adaptations (HAAs) in Tibetan genomes. We discovered the most differentiated variants between TBN and HAN at chromosome 1q42.2 and 2p21. EGLN1 (or HIFPH2, MIM 606425) and EPAS1 (or HIF2A, MIM 603349), both related to hypoxia-inducible factor, were found most differentiated in the two regions, respectively. Strong positive correlations were also observed between the frequency of TBN-dominant haplotypes in the two gene regions and altitude in East Asian populations. Linkage disequilibrium and further haplotype network analyses of world-wide populations suggested the antiquity of the TBN-dominant haplotypes and long-term persistence of the natural selection. Finally, a "dominant haplotype carrier" hypothesis could describe the role of the two genes in HAA. All of our population genomic and statistical analyses indicate that EPAS1 and EGLN1 are most likely responsible for HAA of Tibetans. Interestingly, one each but not both of the two genes were also identified by three recent studies. We reanalyzed the available data and found the escaped top signal (EPAS1) could be recaptured with data quality control and our approaches. Based on this experience, we call for more attention to be paid to controlling data quality and batch effects introduced in public data integration. Our results also suggest limitations of extended haplotype homozygosity-based method due to its compromised power in case the natural selection initiated long time ago and particularly in genomic regions with recombination hotspots.
How chromatin reorganization coordinates differentiation and lineage commitment from hematopoietic stem and progenitor cells (HSPCs) to mature immune cells has not been well understood. Here, we carried out an integrative analysis of chromatin accessibility, topologically associating domains, AB compartments, and gene expression from HSPCs to CD4CD8 T cells. We found that abrupt genome-wide changes at all three levels of chromatin organization occur during the transition from double-negative stage 2 (DN2) to DN3, accompanying the T lineage commitment. The transcription factor BCL11B, a critical regulator of T cell commitment, is associated with increased chromatin interaction, and Bcl11b deletion compromised chromatin interaction at its target genes. We propose that these large-scale and concerted changes in chromatin organization present an energy barrier to prevent the cell from reversing its fate to earlier stages or redirecting to alternatives and thus lock the cell fate into the T lineages.
DNase I hypersensitive sites (DHSs) provide important information on the presence of transcriptional regulatory elements and the state of chromatin in mammalian cells1–3. Conventional DNase-Seq for genome-wide DHSs profiling is limited by the requirement of millions of cells4,5. Here we report an ultrasensitive strategy, called Pico-Seq, for detection of genome-wide DHSs in single cells. We show that DHS patterns at the single cell level are highly reproducible among individual cells. Among different single cells, highly expressed gene promoters and the enhancers associated with multiple active histone modifications display constitutive DHS while chromatin regions with fewer histone modifications exhibit high variation of DHS. Furthermore, the single-cell DHSs predict enhancers that regulate cell-specific gene expression programs and the cell-to-cell variations of DHS are predictive of gene expression. Finally, we apply Pico-Seq to pools of tumor cells and pools of normal cells, dissected from formalin-fixed paraffin-embedded (FFPE) tissue slides from thyroid cancer patients, and detect thousands of tumor-specific DHSs. Many of these DHSs are associated with promoters and enhancers critically involved in cancer development. Analysis of the DHS sequences uncovers one single-nucleotide variant (chr18:52417839 G>C) in the tumor cells of a follicular thyroid carcinoma patient, which affects the binding of the tumor suppressor protein p53 and correlates with decreased expression of its target gene TXNL1. In conclusion, Pico-Seq can reliably detect DHSs in single cells, greatly extending the range of applications of DHS analysis for both basic and translational research and may provide critical information for personalized medicine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.