Approaches exploiting extremes of the trait distribution may reveal novel loci for common traits, but it is unknown whether such loci are generalizable to the general population. In a genome-wide search for loci associated with upper vs. lower 5th percentiles of body mass index, height and waist-hip ratio, as well as clinical classes of obesity including up to 263,407 European individuals, we identified four new loci (IGFBP4, H6PD, RSRC1, PPP2R2A) influencing height detected in the tails and seven new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3, ZZZ3) for clinical classes of obesity. Further, we show that there is large overlap in terms of genetic structure and distribution of variants between traits based on extremes and the general population and little etiologic heterogeneity between obesity subgroups.
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.
It is not economically feasible to sequence all study subjects in a large cohort. A cost-effective strategy is to sequence only the subjects with the extreme values of a quantitative trait. In the National Heart, Lung, and Blood Institute Exome Sequencing Project, subjects with the highest or lowest values of body mass index, LDL, or blood pressure were selected for whole-exome sequencing. Failure to account for such trait-dependent sampling can cause severe inflation of type I error and substantial loss of power in quantitative trait analysis, especially when combining results from multiple studies with different selection criteria. We present valid and efficient statistical methods for association analysis of sequencing data under trait-dependent sampling. We pay special attention to gene-based analysis of rare variants. Our methods can be used to perform quantitative trait analysis not only for the trait that is used to select subjects for sequencing but for any other traits that are measured. For a particular trait of interest, our approach properly combines the association results from all studies with measurements of that trait. This meta-analysis is substantially more powerful than the analysis of any single study. By contrast, meta-analysis of standard linear regression results (ignoring trait-dependent sampling) can be less powerful than the analysis of a single study. The advantages of the proposed methods are demonstrated through simulation studies and the National Heart, Lung, and Blood Institute Exome Sequencing Project data. The methods are applicable to other types of genetic association studies and nongenetic studies. R ecent technological advances have made it possible to sequence genomic regions for association studies. At the present time, it is prohibitively expensive to perform large-scale wholeexome sequencing. In the near future, whole-exome sequencing on thousands of subjects will be economically feasible, but not whole-genome sequencing. If a quantitative trait is of primary interest in a large cohort study, a cost-effective strategy is to sequence those subjects with the extreme trait values preferentially. This strategy can substantially increase statistical power (relative to sequencing a random sample with the same number of subjects), as suggested by research in various contexts (1-9). Indeed, such trait-dependent sampling has been adopted in many sequencing projects, including the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) and the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) resequencing project. The NHLBI ESP consists of multiple studies, each of which is focused on one trait. For the body mass index (BMI) study, 267 subjects with BMI values >40 and 178 subjects with BMI values <25 were selected for sequencing out of a total of 11,468 subjects from Women's Health Initiative (WHI). Similar designs were used for the LDL and blood pressure (BP) studies, although the sampling was based on residuals (to adjust for age, sex,...
More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific gene expression measurements from single-cell RNA sequencing (scRNA-seq). We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We apply our framework to multiple scRNA-seq datasets from different platforms and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and scRNA-seq datasets and further validated using PubMed search and existing bulk case-control testing results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.