In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.
In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, linear mixed model and the recently proposed logistic mixed model, perform poorly --producing large type I error rates --in the analysis of phenotypes with unbalanced case-control ratios. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation (SPA) to calibrate the distribution of score test statistics. This method, SAIGE, provides accurate p-values even when case-control ratios are extremely unbalanced. It utilizes state-of-art optimization strategies to reduce computational time and memory cost of generalized mixed model. The computation cost linearly depends on sample size, and hence can be applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK-Biobank data of 408,961 white British European-ancestry samples, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.
BackgroundThe phecode system was built upon the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) for phenome-wide association studies (PheWAS) using the electronic health record (EHR).ObjectiveThe goal of this paper was to develop and perform an initial evaluation of maps from the International Classification of Diseases, 10th Revision (ICD-10) and the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes to phecodes.MethodsWe mapped ICD-10 and ICD-10-CM codes to phecodes using a number of methods and resources, such as concept relationships and explicit mappings from the Centers for Medicare & Medicaid Services, the Unified Medical Language System, Observational Health Data Sciences and Informatics, Systematized Nomenclature of Medicine-Clinical Terms, and the National Library of Medicine. We assessed the coverage of the maps in two databases: Vanderbilt University Medical Center (VUMC) using ICD-10-CM and the UK Biobank (UKBB) using ICD-10. We assessed the fidelity of the ICD-10-CM map in comparison to the gold-standard ICD-9-CM phecode map by investigating phenotype reproducibility and conducting a PheWAS.ResultsWe mapped >75% of ICD-10 and ICD-10-CM codes to phecodes. Of the unique codes observed in the UKBB (ICD-10) and VUMC (ICD-10-CM) cohorts, >90% were mapped to phecodes. We observed 70-75% reproducibility for chronic diseases and <10% for an acute disease for phenotypes sourced from the ICD-10-CM phecode map. Using the ICD-9-CM and ICD-10-CM maps, we conducted a PheWAS with a Lipoprotein(a) genetic variant, rs10455872, which replicated two known genotype-phenotype associations with similar effect sizes: coronary atherosclerosis (ICD-9-CM: P<.001; odds ratio (OR) 1.60 [95% CI 1.43-1.80] vs ICD-10-CM: P<.001; OR 1.60 [95% CI 1.43-1.80]) and chronic ischemic heart disease (ICD-9-CM: P<.001; OR 1.56 [95% CI 1.35-1.79] vs ICD-10-CM: P<.001; OR 1.47 [95% CI 1.22-1.77]).ConclusionsThis study introduces the beta versions of ICD-10 and ICD-10-CM to phecode maps that enable researchers to leverage accumulated ICD-10 and ICD-10-CM data for PheWAS in the EHR.
Modern MRI image processing methods have yielded quantitative, morphometric, functional, and structural assessments of the human brain. These analyses typically exploit carefully optimized protocols for specific imaging targets. Algorithm investigators have several excellent public data resources to use to test, develop, and optimize their methods. Recently, there has been an increasing focus on combining MRI protocols in multi-parametric studies. Notably, these have included innovative approaches for fusing connectivity inferences with functional and/or anatomical characterizations. Yet, validation of the reproducibility of these interesting and novel methods has been severely hampered by the limited availability of appropriate multi-parametric data. We present an imaging protocol optimized to include state-of-the-art assessment of brain function, structure, micro-architecture, and quantitative parameters within a clinically feasible 60 minute protocol on a 3T MRI scanner. We present scan-rescan reproducibility of these imaging contrasts based on 21 healthy volunteers (11 M/10 F, 22-61 y/o). The cortical gray matter, cortical white matter, ventricular cerebrospinal fluid, thalamus, putamen, caudate, cerebellar gray matter, cerebellar white matter, and brainstem were identified with mean volume-wise reproducibility of 3.5%. We tabulate the mean intensity, variability and reproducibility of each contrast in a region of interest approach, which is essential for prospective study planning and retrospective power analysis considerations. Anatomy was highly consistent on structural acquisition (~1-5% variability), while variation on diffusion and several other quantitative scans was higher (~<10%). Some sequences are particularly variable in specific structures (ASL exhibited variation of 28% in Corresponding author: Bennett A. Landman, PhD, Vanderbilt University EECS, 2301 Vanderbilt Pl., PO Box 351679 Station B, Nashville, TN 37235-1679, Work: 410-917-6166, bennett.landman@vanderbilt.edu. Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. NIH Public Access Author ManuscriptNeuroimage. Author manuscript; available in PMC 2012 February 14. NIH-PA Author ManuscriptNIH-PA Author Manuscript NIH-PA Author Manuscript the cerebral white matter) or in thin structures (quantitative T2 varied by up to 73% in the caudate) due, in large part, to variability in automated ROI placement. The richness of the joint distribution of intensities across imaging methods can be best assessed within the context of a particular analysis approach as opposed to a summary table. As such, all imagi...
Gifford A, Towse TF, Walker RC, Avison MJ, Welch EB. Characterizing active and inactive brown adipose tissue in adult humans using PET-CT and MR imaging.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.