Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
IntroductionLate-onset Alzheimer's disease (LOAD, onset age > 60 years) is the most prevalent dementia in the elderly 1 , and risk is partially driven by genetics 2 . Many of the loci responsible for this genetic risk were identified by genome-wide association studies (GWAS) [3][4][5][6][7][8] . To identify additional LOAD risk loci, the we performed the largest GWAS to date (89,769 individuals), analyzing both common and rare variants. We confirm 20 previous LOAD risk loci and identify four new genome-wide loci (IQCK, ACE, ADAM10, and ADAMTS1). Pathway analysis of these data implicates the immune system and lipid metabolism, and for the first time tau binding proteins and APP metabolism. These findings show that genetic variants affecting APP and Aβ processing are not only associated with early-onset autosomal dominant AD but also with LOAD. Analysis of AD risk genes and pathways show enrichment for rare variants (P = 1.32 x 10 -7 ) indicating that additional rare variants remain to be identified. Main TextOur previous work identified 19 genome-wide significant common variant signals in addition to APOE 9 , that influence risk for LOAD. These signals, combined with 'subthreshold' common variant associations, account for ~31% of the genetic variance of LOAD 2 , leaving the majority of genetic risk uncharacterized 10 . To search for additional signals, we conducted a GWAS metaanalysis of non-Hispanic Whites (NHW) using a larger sample (17 new, 46 total datasets) from our group, the International Genomics of Alzheimer's Project (IGAP) (composed of four AD consortia: ADGC, CHARGE, EADI, and GERAD). This sample increases our previous discovery sample (Stage 1) by 29% for cases and 13% for controls (N=21,982 cases; 41,944 controls) ( Supplementary Table 1 and 2, and Supplementary Note). To sample both common and rare variants (minor allele frequency MAF ≥ 0.01, and MAF < 0.01, respectively), we imputed the discovery datasets using a 1000 Genomes reference panel consisting of . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a 11 36,648,992 single-nucleotide variants, 1,380,736 insertions/deletions, and 13,805 structural variants. After quality control, 9,456,058 common variants and 2,024,574 rare variants were selected for analysis (a 63% increase from our previous common variant analysis in 2013).Genotype dosages were analyzed within each dataset, and then combined with meta-analysis ( Supplementary Figures 1 and 2 and Supplementary Table 3). The Stage 1 discovery metaanalysis was first followed by Stage 2 using the I-select chip we previously developed in Lambert et al (including 11,632 variants, N=18,845) and finally stage 3A (N=6,998). The final sample was 33,692 clinical AD cases and 56,077 controls.Meta-analysis of Stages 1 and 2 produced 21 associations with P ≤ 5x10 -8 (Table 1 and Figure 1). Of these, 18 were previously reported as genome-wide significant and three of them are signals not initially described in Lambert et al: the rare R47H TREM2 coding va...
The promise of “personalized medicine” guided by an understanding of each individual’s genome has been fostered by increasingly powerful and economical methods to acquire clinically relevant features. We describe operational implementation of prospective genotyping linked to an advanced clinical decision support system to guide individualized healthcare in a large academic health center. This approach to personalized medicine includes patient and healthcare provider engagement, identifying relevant genetic variation for implementation, assay reliability, point-of-care decision support, and necessary institutional investments. In one year, approximately 3,000 patients, most scheduled for cardiac catheterization, were genotyped on a multiplexed platform including CYP2C19 variants that modulate response to the widely-used antiplatelet drug clopidogrel. These data are deposited into the Electronic Medical Record and point-of-care decision support is deployed when clopidogrel is prescribed for those with variant genotypes. The establishment of programs such as this is a first step toward implementing and evaluating strategies for personalized medicine.
We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10(-9)). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10(-6)). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10(-5)), nodular (OR = 0.76, p = 3.1 × 10(-5)) and multinodular (OR = 0.69, p = 3.9 × 10(-5)) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10(-3)), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10(-13)), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.
Large-scale DNA databanks linked to electronic medical record (EMR) systems have been proposed as an approach for rapidly generating large, diverse cohorts for discovery and replication of genotype-phenotype associations. However, the extent to which such resources are capable of delivering on this promise is unknown. We studied whether an EMR-linked DNA biorepository can be used to detect known genotype-phenotype associations for five diseases. Twenty-one SNPs previously implicated as common variants predisposing to atrial fibrillation, Crohn disease, multiple sclerosis, rheumatoid arthritis, or type 2 diabetes were successfully genotyped in 9483 samples accrued over 4 mo into BioVU, the Vanderbilt University Medical Center DNA biobank. Previously reported odds ratios (OR(PR)) ranged from 1.14 to 2.36. For each phenotype, natural language processing techniques and billing-code queries were used to identify cases (n = 70-698) and controls (n = 808-3818) from deidentified health records. Each of the 21 tests of association yielded point estimates in the expected direction. Previous genotype-phenotype associations were replicated (p < 0.05) in 8/14 cases when the OR(PR) was > 1.25, and in 0/7 with lower OR(PR). Statistically significant associations were detected in all analyses that were adequately powered. In each of the five diseases studied, at least one previously reported association was replicated. These data demonstrate that phenotypes representing clinical diagnoses can be extracted from EMR systems, and they support the use of DNA resources coupled to EMR systems as tools for rapid generation of large data sets required for replication of associations found in research cohorts and for discovery in genome science.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.