Meta-analysis of gene-level tests for rare variant association

Liu, Dajiang; Peloso, Gina M.; Zhan, Xiaowei; Holmen, Oddgeir L.; Zawistowski, Matthew; Feng, Shuang; Nikpay, Majid; Auer, Paul L.; Goel, Anuj; Zhang, He; Peters, Ulrike; Farrall, Martin; Orho‐Melander, Marju; Kooperberg, Charles; McPherson, Ruth; Watkins, Hugh; Willer, Cristen J.; Hveem, Kristian; Melander, Olle; Kathiresan, Sekar; Abecasis, Gonçalo R.

doi:10.1038/ng.2852

Cited by 180 publications

(255 citation statements)

References 42 publications

Supporting

Mentioning

254

Contrasting

Order By: Relevance

“…The summary statistics include the score vector and its covariance matrix, which have been widely used for combining multiple sequencing studies for a meta-analysis. 22,23 This makes it easier to apply the proposed method to existing sequencing studies to get updated p values incorporating functional annotation scores. Moreover, this feature enables simple but useful extensions to other types of studies, such as family-based association studies and longitudinal studies.…”

Section: Discussionmentioning

confidence: 99%

“…22,23 The score vector S is sufficient for constructing test statistics Q r k;m ;k for any r k,m and k and the resulting unified statistic. We propose the following resampling steps for estimating their distributions by using S. …”

Section: Resampling Methods Using Summary Statistics Onlymentioning

confidence: 99%

“…22,23 We show that the same set of summary statistics can be used for constructing meta-analysis test statistics that incorporate functional scores and for calculating the meta-analysis p values. Let S l be the vector of score statistics corresponding to study l, 1 % l % L, for the same set of genetic variants and S l be the estimated covariance matrix of S l .…”

Section: Meta-analysis Using Summary Statisticsmentioning

confidence: 99%

See 2 more Smart Citations

Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data

Lee

et al. 2017

The American Journal of Human Genetics

View full text Add to dashboard Cite

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner.The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Resampling Methods Using Summary Statistics Onlymentioning

confidence: 99%

Section: Meta-analysis Using Summary Statisticsmentioning

confidence: 99%

See 1 more Smart Citation

Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data

Lee

et al. 2017

The American Journal of Human Genetics

View full text Add to dashboard Cite

show abstract

“…Taking GWAS as an example, virtually all meta-analyses to date have been conducted at the summary-statistics level rather than the raw-data level (Lango and others, 2010;Liu and others, 2014). The emergence of big data, such as next-generation sequencing data, makes the collation of raw data even more challenging.…”

Section: Introductionmentioning

confidence: 99%

Sparse meta-analysis with high-dimensional data

Zhang

Avery

et al. 2016

Biostatistics

View full text Add to dashboard Cite

SUMMARYMeta-analysis plays an important role in summarizing and synthesizing scientific evidence derived from multiple studies. With high-dimensional data, the incorporation of variable selection into meta-analysis improves model interpretation and prediction. Existing variable selection methods require direct access to raw data, which may not be available in practical situations. We propose a new approach, sparse metaanalysis (SMA), in which variable selection for meta-analysis is based solely on summary statistics and the effect sizes of each covariate are allowed to vary among studies. We show that the SMA enjoys the oracle property if the estimated covariance matrix of the parameter estimators from each study is available. We also show that our approach achieves selection consistency and estimation consistency even when summary statistics include only the variance estimators or no variance/covariance information at all. Simulation studies and applications to high-throughput genomics studies demonstrate the usefulness of our approach.

show abstract

“…Meta-analysis can also be performed to combine results from different studies or populations. 23 SEQSpark is ideal to use for the analysis of large-scale genetic epidemiological studies. It has higher computational efficiency for data quality control, annotation, and association analysis than other available software.…”

mentioning

confidence: 99%

SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data

Zhang

Zhao

et al. 2017

The American Journal of Human Genetics

View full text Add to dashboard Cite

Massively parallel sequencing technologies provide great opportunities for discovering rare susceptibility variants involved in complex disease etiology via large-scale imputation and exome and whole-genome sequence-based association studies. Due to modest effect sizes, large sample sizes of tens to hundreds of thousands of individuals are required for adequately powered studies. Current analytical tools are obsolete when it comes to handling these large datasets. To facilitate the analysis of large-scale sequence-based studies, we developed SEQSpark which implements parallel processing based on Spark to increase the speed and efficiency of performing data quality control, annotation, and association analysis. To demonstrate the versatility and speed of SEQSpark, we analyzed whole-genome sequence data from the UK10K, testing for associations with waist-to-hip ratios. The analysis, which was completed in 1.5 hr, included loading data, annotation, principal component analysis, and single variant and rare variant aggregate association analysis of >9 million variants. For rare variant aggregate analysis, an exome-wide significant association (p < 2.5 3 10 À6 ) was observed with CCDC62 (SKAT-O [p ¼ 6.89 3 10 À7 ], combined multivariate collapsing [p ¼ 1.48 3 10 À6 ], and burden of rare variants [p ¼ 1.48 3 10 À6 ]). SEQSpark was also used to analyze 50,000 simulated exomes and it required 1.75 hr for the analysis of a quantitative trait using several rare variant aggregate association methods. Additionally, the performance of SEQSpark was compared to Variant Association Tools and PLINK/SEQ. SEQSpark was always faster and in some situations computation was reduced to a hundredth of the time. SEQSpark will empower large sequence-based epidemiological studies to quickly elucidate genetic variation involved in the etiology of complex traits.Massively parallel sequencing technologies are generating an unprecedented amount of sequence data on various kinds of samples including human exomes and genomes. Many rare variant association methods have been developed to elucidate the underlying disease etiology using large-scale population-based sequence datasets. 1-5 Although some findings are promising, 6 statistical power analyses performed with simulated data demonstrate that large sample sizes of tens or even hundreds of thousands of individuals are required for adequately powered studies. 7,8 Large-scale genetic epidemiological studies are currently ongoing, including the Trans-Omics for Precision Medicine program (TopMed) (see Web Resources) and UK BioBank 9 studies. Additional large-scale genetic epidemiological studies are emerging that will generate wholegenome sequence (WGS) data or impute WGS data into existing genotype array data to better understand the genetic etiology of complex traits.It is problematic to analyze large datasets of massively parallel sequence data given the limitations of current analytic tools for annotation, data quality control, and association testing. 9,10 Analytic tools such as PLINK/SEQ and ...

show abstract

Meta-analysis of gene-level tests for rare variant association

Cited by 180 publications

References 42 publications

Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data

Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data

Sparse meta-analysis with high-dimensional data

SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data

Contact Info

Product

Resources

About