Genome-wide association studies have revealed numerous risk loci associated with diverse diseases. However, identification of disease-causing variants within association loci remains a major challenge. Divergence in gene expression due to cis-regulatory variants in noncoding regions is central to disease susceptibility. We show that integrative computational analysis of phylogenetic conservation with a complexity assessment of co-occurring transcription factor binding sites (TFBS) can identify cis-regulatory variants and elucidate their mechanistic role in disease. Analysis of established type 2 diabetes risk loci revealed a striking clustering of distinct homeobox TFBS. We identified the PRRX1 homeobox factor as a repressor of PPARG2 expression in adipose cells and demonstrate its adverse effect on lipid metabolism and systemic insulin sensitivity, dependent on the rs4684847 risk allele that triggers PRRX1 binding. Thus, cross-species conservation analysis at the level of co-occurring TFBS provides a valuable contribution to the translation of genetic association signals to disease-related molecular mechanisms.
Genome-wide association studies have previously identified 23 genetic loci associated with circulating fibrinogen concentration. These studies used HapMap imputation and did not examine the X-chromosome. 1000 Genomes imputation provides better coverage of uncommon variants, and includes indels. We conducted a genome-wide association analysis of 34 studies imputed to the 1000 Genomes Project reference panel and including ∼120 000 participants of European ancestry (95 806 participants with data on the X-chromosome). Approximately 10.7 million single-nucleotide polymorphisms and 1.2 million indels were examined. We identified 41 genome-wide significant fibrinogen loci; of which, 18 were newly identified. There were no genome-wide significant signals on the X-chromosome. The lead variants of five significant loci were indels. We further identified six additional independent signals, including three rare variants, at two previously characterized loci: FGB and IRF1. Together the 41 loci explain 3% of the variance in plasma fibrinogen concentration.
Key Points
Twelve independent, novel, low-frequency (n = 2) and rare (n = 10) genetic variants were associated with fibrinogen, FVII, FVIII, or vWF. Nine were within previously associated genes, and 3 novel candidate genes (KCNT1, HID1, and KATNB1) were confined to cohorts of African ancestry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.