Biological and empirical evidence suggests that rare variants account for a large proportion of the genetic contributions to complex human diseases. Recent technological advances in high-throughput sequencing platforms have made it possible for researchers to generate comprehensive information on rare variants in large samples. We provide a general framework for association testing with rare variants by combining mutation information across multiple variant sites within a gene and relating the enriched genetic information to disease phenotypes through appropriate regression models. Our framework covers all major study designs (i.e., case-control, cross-sectional, cohort and family studies) and all common phenotypes (e.g., binary, quantitative, and age at onset), and it allows arbitrary covariates (e.g., environmental factors and ancestry variables). We derive theoretically optimal procedures for combining rare mutations and construct suitable test statistics for various biological scenarios. The allele-frequency threshold can be fixed or variable. The effects of the combined rare mutations on the phenotype can be in the same direction or different directions. The proposed methods are statistically more powerful and computationally more efficient than existing ones. An application to a deep-resequencing study of drug targets led to a discovery of rare variants associated with total cholesterol. The relevant software is freely available.
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.
Background Ezetimibe lowers plasma levels of low-density lipoprotein (LDL) cholesterol by inhibiting the activity of the Niemann–Pick C1-like 1 (NPC1L1) protein. However, whether such inhibition reduces the risk of coronary heart disease is not known. Human mutations that inactivate a gene encoding a drug target can mimic the action of an inhibitory drug and thus can be used to infer potential effects of that drug. Methods We sequenced the exons of NPC1L1 in 7364 patients with coronary heart disease and in 14,728 controls without such disease who were of European, African, or South Asian ancestry. We identified carriers of inactivating mutations (nonsense, splice-site, or frameshift mutations). In addition, we genotyped a specific inactivating mutation (p.Arg406X) in 22,590 patients with coronary heart disease and in 68,412 controls. We tested the association between the presence of an inactivating mutation and both plasma lipid levels and the risk of coronary heart disease. Results With sequencing, we identified 15 distinct NPC1L1 inactivating mutations; approximately 1 in every 650 persons was a heterozygous carrier for 1 of these mutations. Heterozygous carriers of NPC1L1 inactivating mutations had a mean LDL cholesterol level that was 12 mg per deciliter (0.31 mmol per liter) lower than that in noncarriers (P = 0.04). Carrier status was associated with a relative reduction of 53% in the risk of coronary heart disease (odds ratio for carriers, 0.47; 95% confidence interval, 0.25 to 0.87; P = 0.008). In total, only 11 of 29,954 patients with coronary heart disease had an inactivating mutation (carrier frequency, 0.04%) in contrast to 71 of 83,140 controls (carrier frequency, 0.09%). Conclusions Naturally occurring mutations that disrupt NPC1L1 function were found to be associated with reduced plasma LDL cholesterol levels and a reduced risk of coronary heart disease. (Funded by the National Institutes of Health and others.)
It is not economically feasible to sequence all study subjects in a large cohort. A cost-effective strategy is to sequence only the subjects with the extreme values of a quantitative trait. In the National Heart, Lung, and Blood Institute Exome Sequencing Project, subjects with the highest or lowest values of body mass index, LDL, or blood pressure were selected for whole-exome sequencing. Failure to account for such trait-dependent sampling can cause severe inflation of type I error and substantial loss of power in quantitative trait analysis, especially when combining results from multiple studies with different selection criteria. We present valid and efficient statistical methods for association analysis of sequencing data under trait-dependent sampling. We pay special attention to gene-based analysis of rare variants. Our methods can be used to perform quantitative trait analysis not only for the trait that is used to select subjects for sequencing but for any other traits that are measured. For a particular trait of interest, our approach properly combines the association results from all studies with measurements of that trait. This meta-analysis is substantially more powerful than the analysis of any single study. By contrast, meta-analysis of standard linear regression results (ignoring trait-dependent sampling) can be less powerful than the analysis of a single study. The advantages of the proposed methods are demonstrated through simulation studies and the National Heart, Lung, and Blood Institute Exome Sequencing Project data. The methods are applicable to other types of genetic association studies and nongenetic studies. R ecent technological advances have made it possible to sequence genomic regions for association studies. At the present time, it is prohibitively expensive to perform large-scale wholeexome sequencing. In the near future, whole-exome sequencing on thousands of subjects will be economically feasible, but not whole-genome sequencing. If a quantitative trait is of primary interest in a large cohort study, a cost-effective strategy is to sequence those subjects with the extreme trait values preferentially. This strategy can substantially increase statistical power (relative to sequencing a random sample with the same number of subjects), as suggested by research in various contexts (1-9). Indeed, such trait-dependent sampling has been adopted in many sequencing projects, including the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) and the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) resequencing project. The NHLBI ESP consists of multiple studies, each of which is focused on one trait. For the body mass index (BMI) study, 267 subjects with BMI values >40 and 178 subjects with BMI values <25 were selected for sequencing out of a total of 11,468 subjects from Women's Health Initiative (WHI). Similar designs were used for the LDL and blood pressure (BP) studies, although the sampling was based on residuals (to adjust for age, sex,...
Recent advances in sequencing technologies have made it possible to explore the influence of rare variants on complex diseases and traits. Meta-analysis is essential to this exploration because large sample sizes are required to detect rare variants. Several methods are available to conduct meta-analysis for rare variants under fixed-effects models, which assume that the genetic effects are the same across all studies. In practice, genetic associations are likely to be heterogeneous among studies because of differences in population composition, environmental factors, phenotype and genotype measurements, or analysis method. We propose random-effects models which allow the genetic effects to vary among studies and develop the corresponding meta-analysis methods for gene-level association tests. Our methods take score statistics, rather than individual participant data, as input and thus can accommodate any study designs and any phenotypes. We produce the random-effects versions of all commonly used gene-level association tests, including burden, variable threshold, and variance-component tests. We demonstrate through extensive simulation studies that our random-effects tests are substantially more powerful than the fixed-effects tests in the presence of moderate and high between-study heterogeneity and achieve similar power to the latter when the heterogeneity is low. The usefulness of the proposed methods is further illustrated with data from National Heart, Lung, and Blood Institute Exome Sequencing Project (NHLBI ESP). The relevant software is freely available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.