Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Background Barrett’s Esophagus (BE) is the precursor and the biggest risk factor for esophageal adenocarcinoma (EAC), the solid cancer with the fastest rising incidence in the US and western world. Current strategies to decrease morbidity and mortality from EAC have focused on identifying and surveying patients with BE using upper endoscopy. An accurate estimate of the number of patients with BE in the population is important to inform public health policy and to prioritize resources for potential screening and management programs. However, the true prevalence of BE is difficult to ascertain because the condition frequently is symptomatically silent, and the numerous clinical studies that have analyzed BE prevalence have produced a wide range of estimates. The aim of this study was to use a computer simulation disease model of EAC to determine the estimates for BE prevalence that best align with US SEER cancer registry data. Methods A previously developed mathematical model of EAC was modified to perform this analysis. The model consists of six health states: Normal, GERD, BE, Undetected Cancer, Detected Cancer and Death. Published literature regarding the transition rates between these states were used to provide boundaries. During the one million computer simulations that were performed, these transition rates were systematically varied, producing differing prevalences for the numerous health states. Two filters were sequentially applied to select out superior simulations that were most consistent with clinical data. First, among these million simulations, the 1,000 that best reproduced SEER cancer incidence data were selected. Next, of those 1000 best simulations, the 100 with an overall calculated BE to Detected Cancer rates closest to published estimates were selected. Finally, the prevalence of BE in the final set of best 100 simulations was analyzed. Results We present histogram data depicting BE prevalences for all one million simulations, the 1000 simulations that best approximate SEER data, and the final set of 100 simulations. Using the best 100 simulations, we estimate the prevalence of BE to be 5.6% [5.49–5.70%]. Conclusions Using our model, an estimated prevalence for BE in the general population of 5.6% [5.49–5.70%] accurately predicts incidence rates for EAC reported to the US SEER cancer registry. Future clinical studies are needed to confirm our estimate.
We introduce a liability-threshold mixed linear model (LTMLM) association statistic for case-control studies and show that it has a well-controlled false-positive rate and more power than existing mixed-model methods for diseases with low prevalence. Existing mixed-model methods suffer a loss in power under case-control ascertainment, but no solution has been proposed. Here, we solve this problem by using a χ(2) score statistic computed from posterior mean liabilities (PMLs) under the liability-threshold model. Each individual's PML is conditional not only on that individual's case-control status but also on every individual's case-control status and the genetic relationship matrix (GRM) obtained from the data. The PMLs are estimated with a multivariate Gibbs sampler; the liability-scale phenotypic covariance matrix is based on the GRM, and a heritability parameter is estimated via Haseman-Elston regression on case-control phenotypes and then transformed to the liability scale. In simulations of unrelated individuals, the LTMLM statistic was correctly calibrated and achieved higher power than existing mixed-model methods for diseases with low prevalence, and the magnitude of the improvement depended on sample size and severity of case-control ascertainment. In a Wellcome Trust Case Control Consortium 2 multiple sclerosis dataset with >10,000 samples, LTMLM was correctly calibrated and attained a 4.3% improvement (p = 0.005) in χ(2) statistics over existing mixed-model methods at 75 known associated SNPs, consistent with simulations. Larger increases in power are expected at larger sample sizes. In conclusion, case-control studies of diseases with low prevalence can achieve power higher than that in existing mixed-model methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.