Bayes factors (BFs) are becoming increasingly important tools in genetic association studies, partly because they provide a natural framework for including prior information. The Wakefield BF (WBF) approximation is easy to calculate and assumes a normal prior on the log odds ratio (logOR) with a mean of zero. However, the prior variance (W) must be specified. Because of the potentially high sensitivity of the WBF to the choice of W, we propose several new BF approximations with logOR ∼N(0,W), but allow W to take a probability distribution rather than a fixed value. We provide several prior distributions for W which lead to BFs that can be calculated easily in freely available software packages. These priors allow a wide range of densities for W and provide considerable flexibility. We examine some properties of the priors and BFs and show how to determine the most appropriate prior based on elicited quantiles of the prior odds ratio (OR). We show by simulation that our novel BFs have superior true‐positive rates at low false‐positive rates compared to those from both P‐value and WBF analyses across a range of sample sizes and ORs. We give an example of utilizing our BFs to fine‐map the CASP8 region using genotype data on approximately 46,000 breast cancer case and 43,000 healthy control samples from the Collaborative Oncological Gene‐environment Study (COGS) Consortium, and compare the single‐nucleotide polymorphism ranks to those obtained using WBFs and P‐values from univariate logistic regression.
Genome-wide association studies have successfully identified associations between common diseases and a large number of single nucleotide polymorphisms (SNPs) across the genome. We investigate the effectiveness of several statistics, including p-values, likelihoods, genetic map distance and linkage disequilibrium between SNPs, in filtering SNPs in several disease-associated regions. We use simulated data to compare the efficacy of filters with different sample sizes and for causal SNPs with different minor allele frequencies (MAFs) and effect sizes, focusing on the small effect sizes and MAFs likely to represent the majority of unidentified causal SNPs. In our analyses, of all the methods investigated, filtering on the ranked likelihoods consistently retains the true causal SNP with the highest probability for a given false positive rate. This was the case for all the local linkage disequilibrium patterns investigated. Our results indicate that when using this method to retain only the top 5% of SNPs, even a causal SNP with an odds ratio of 1.1 and MAF of 0.08 can be retained with a probability exceeding 0.9 using an overall sample size of 50,000.
There is a large amount of functional genetic data available, which can be used to inform fine‐mapping association studies (in diseases with well‐characterised disease pathways). Single nucleotide polymorphism (SNP) prioritization via Bayes factors is attractive because prior information can inform the effect size or the prior probability of causal association. This approach requires the specification of the effect size. If the information needed to estimate a priori the probability density for the effect sizes for causal SNPs in a genomic region isn't consistent or isn't available, then specifying a prior variance for the effect sizes is challenging. We propose both an empirical method to estimate this prior variance, and a coherent approach to using SNP‐level functional data, to inform the prior probability of causal association. Through simulation we show that when ranking SNPs by our empirical Bayes factor in a fine‐mapping study, the causal SNP rank is generally as high or higher than the rank using Bayes factors with other plausible values of the prior variance. Importantly, we also show that assigning SNP‐specific prior probabilities of association based on expert prior functional knowledge of the disease mechanism can lead to improved causal SNPs ranks compared to ranking with identical prior probabilities of association. We demonstrate the use of our methods by applying the methods to the fine mapping of the CASP8 region of chromosome 2 using genotype data from the Collaborative Oncological Gene‐Environment Study (COGS) Consortium. The data we analysed included approximately 46,000 breast cancer case and 43,000 healthy control samples.
Potential predictive biomarkers are often measured on a continuous scale, but in practice, a threshold value to divide the patient population into biomarker ‘positive’ and ‘negative’ is desirable. Early phase clinical trials are increasingly using biomarkers for patient selection, but at this stage, it is likely that little will be known about the relationship between the biomarker and the treatment outcome. We describe a single-arm trial design with adaptive enrichment, which can increase power to demonstrate efficacy within a patient subpopulation, the parameters of which are also estimated. Our design enables us to learn about the biomarker and optimally adjust the threshold during the study, using a combination of generalised linear modelling and Bayesian prediction. At the final analysis, a binomial exact test is carried out, allowing the hypothesis that ‘no population subset exists in which the novel treatment has a desirable response rate’ to be tested. Through extensive simulations, we are able to show increased power over fixed threshold methods in many situations without increasing the type-I error rate. We also show that estimates of the threshold, which defines the population subset, are unbiased and often more precise than those from fixed threshold studies. We provide an example of the method applied (retrospectively) to publically available data from a study of the use of tamoxifen after mastectomy by the German Breast Study Group, where progesterone receptor is the biomarker of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.