Whole-exome sequencing (WES) and whole-genome sequencing (WGS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer’s disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer’s Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. In order to achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing kits. This approach may lead to variable variant quality across sequencing centers and/or kits. Here, we performed exome-wide and genome-wide association analyses on AD risk using the latest ADSP WES and WGS data releases. We observed that many variants displayed large variation in allele frequencies across sequencing centers/kits and contributed to spurious association signals with AD risk. We also observed that sequencing kit/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented novel filters that aim to capture and remove these center/kit-specific artifactual variants. We conclude by deriving a novel, fast, and robust approach to filter variants that represent sequencing center- or kit-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs.Author SummaryNext generation sequencing data represents a highly valuable resource to uncover rare coding and/or noncoding genetic variants that contribute to Alzheimer’s disease risk. In order to achieve large sample sizes that are required for such data, the Alzheimer’s Disease Sequencing Project (ADSP) has taken the leading role in sequencing Alzheimer’s disease related samples at scale in the United States. The ADSP’s study design however leads to variable variant quality across the involved sequencing centers, necessitating a quality control approach that ensures robust genetic association analyses. Here, we present and validate a rigorous quality control pipeline, where we specifically developed a new strategy to handle inter-center variant quality issues in the ADSP. In doing so, we provide a first glance into exome- and genome-wide associations with Alzheimer’s disease risk using the latest releases of ADSP data (respectively 20.5k and 16.9k individuals). In sum, our pipeline is important to support future robust genetic association studies on ADSP data, as well as other studies with similar design. This in turn will contribute to accelerating Alzheimer’s disease gene discovery and gene-driven therapy development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.