Quantitative trait analysis in sequencing studies under trait-dependent sampling

Lin, Dan-Yu; Zeng, Donglin; Tang, Zheng Zheng

doi:10.1073/pnas.1221713110

Cited by 50 publications

(102 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We focused on the BMI values for the African American participants of the WHI cohort. Among the 8,142 African American participants who were genotyped by the Affymetrix 6.0 arrays, 360 with BMI values > 40 or < 25 were selected for whole-exome sequencing in the NHLBI ESP (2). The distribution of the BMI values is displayed in SI Appendix, Fig.…”

Section: Resultsmentioning

confidence: 99%

“…However, it is still economically infeasible to sequence all subjects in a large cohort, and, therefore, only a subset of cohort members can be selected for sequencing. A cost-effective sampling strategy is to preferentially select subjects in the extremes of a quantitative trait distribution or those with a specific disease (2,3). For case−control studies, an equal number of cases and controls provides more power than other case−control ratios.…”

mentioning

confidence: 99%

“…For case−control studies, an equal number of cases and controls provides more power than other case−control ratios. For quantitative traits, the power increases as more extreme values are sampled (2).…”

mentioning

confidence: 99%

“…The NHLBI ESP consists of three studies that sequenced subjects with the largest and smallest values of body mass index (BMI), low-density lipoprotein, and blood pressure, one case−control study on myocardial infarction, and one case-only study on stroke (2). The CHARGE resequencing project selected subjects with the highest values of 14 quantitative traits, as well as a random sample (4).…”

mentioning

confidence: 99%

See 3 more Smart Citations

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations

Auer

et al. 2015

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

View full text Add to dashboard Cite

In the large cohorts that have been used for genome-wide association studies (GWAS), it is prohibitively expensive to sequence all cohort members. A cost-effective strategy is to sequence subjects with extreme values of quantitative traits or those with specific diseases. By imputing the sequencing data from the GWAS data for the cohort members who are not selected for sequencing, one can dramatically increase the number of subjects with information on rare variants. However, ignoring the uncertainties of imputed rare variants in downstream association analysis will inflate the type I error when sequenced subjects are not a random subset of the GWAS subjects. In this article, we provide a valid and efficient approach to combining observed and imputed data on rare variants. We consider commonly used gene-level association tests, all of which are constructed from the score statistic for assessing the effects of individual variants on the trait of interest. We show that the score statistic based on the observed genotypes for sequenced subjects and the imputed genotypes for nonsequenced subjects is unbiased. We derive a robust variance estimator that reflects the true variability of the score statistic regardless of the sampling scheme and imputation quality, such that the corresponding association tests always have correct type I error. We demonstrate through extensive simulation studies that the proposed tests are substantially more powerful than the use of accurately imputed variants only and the use of sequencing data alone. We provide an application to the Women's Health Initiative. The relevant software is freely available. data integration | gene-level association tests | genotype imputation | linkage disequilibrium | whole-exome sequencing R ecent technological advances have made it possible to conduct high-throughput DNA sequencing studies on rare variants, which have a stronger impact on complex diseases and traits than common variants (1). However, it is still economically infeasible to sequence all subjects in a large cohort, and, therefore, only a subset of cohort members can be selected for sequencing. A cost-effective sampling strategy is to preferentially select subjects in the extremes of a quantitative trait distribution or those with a specific disease (2, 3). For case−control studies, an equal number of cases and controls provides more power than other case−control ratios. For quantitative traits, the power increases as more extreme values are sampled (2).Trait-dependent sampling has been adopted in many sequencing studies, including the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) and the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) resequencing project. The NHLBI ESP consists of three studies that sequenced subjects with the largest and smallest values of body mass index (BMI), low-density lipoprotein, and blood pressure, one case−control study on myocardial infarction, and one case-only study on stroke (2). The CHARGE resequencing pr...

show abstract

Section: Resultsmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations

Auer

et al. 2015

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Failure to account for trait-dependent sampling might increase type I error and reduce power. 27 Thus, we used SCORESeqTDS to generate summary statistics for the LDL, BMI, and BP phenotype groups and used RAREMETALWORKER for the EOMI, stroke, and DPR phenotype groups. Figure S3 shows the workflow for analyzing the six phenotype groups.…”

Section: Softwarementioning

confidence: 99%

Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs

Tang

Lin

2015

The American Journal of Human Genetics

Self Cite

View full text Add to dashboard Cite

There is heightened interest in using next-generation sequencing technologies to identify rare variants that influence complex human diseases and traits. Meta-analysis is essential to this endeavor because large sample sizes are required for detecting associations with rare variants. In this article, we provide a comprehensive overview of statistical methods for meta-analysis of sequencing studies for discovering rare-variant associations. Specifically, we discuss the calculation of relevant summary statistics from participating studies, the construction of gene-level association tests, the choice of transformation for quantitative traits, the use of fixed-effects versus random-effects models, and the removal of shadow association signals through conditional analysis. We also show that meta-analysis based on properly calculated summary statistics is as powerful as joint analysis of individual-participant data. In addition, we demonstrate the performance of different meta-analysis methods by using both simulated and empirical data. We then compare four major software packages for meta-analysis of rare-variant associations-MASS, RAREMETAL, MetaSKAT, and seqMeta-in terms of the underlying statistical methodology, analysis pipeline, and software interface. Finally, we present PreMeta, a software interface that integrates the four meta-analysis packages and allows a consortium to combine otherwise incompatible summary statistics.

show abstract

Statistical challenges in high‐dimensional molecular and genetic epidemiology

2017

View full text Add to dashboard Cite

Molecular and genetic association studies conducted in well‐characterized longitudinal cohorts offer a powerful approach to investigate factors influencing disease course or complex trait expression. As measurement technologies continue to develop and evolve, studies based on existing cohorts raise methodological challenges. Five such challenges are illustrated in two long‐term inter‐disciplinary collaborations. In one, molecular genetic prognostic factors in the natural history of node‐negative breast cancer are investigated using a combination of hypothesis‐testing and hypothesis‐generating molecular approaches. In the other, genome‐wide association methods are applied to identify genes for multiple traits in extended follow‐up data from participants of a therapeutic RCT in type 1 diabetes. The Canadian Journal of Statistics 46: 24–40; 2018 © 2017 Statistical Society of Canada

show abstract

Quantitative trait analysis in sequencing studies under trait-dependent sampling

Cited by 50 publications

References 21 publications

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations

Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs

Statistical challenges in high‐dimensional molecular and genetic epidemiology

Contact Info

Product

Resources

About