Linear mixed models (LMMs) have emerged as the method of choice for confounded genome-wide association studies. However, the performance of LMMs in nonrandomly ascertained case-control studies deteriorates with increasing sample size. We propose a framework called LEAP (Liability Estimator As a Phenotype; https://github.com/omerwe/LEAP) that tests for association with estimated latent values corresponding to severity of phenotype, and demonstrate that this can lead to a substantial power increase.
Main TextIn recent years, genome-wide association studies (GWAS) have uncovered thousands of risk variants for genetic traits 1 . Only a small fraction of disease variance is explained by discovered variants, possibly because contemporary sample sizes are relatively small and that causal variants tend to have small effect sizes 2 . To identify such variants, future studies will need to include hundreds of thousands of individuals.Population structure and family relatedness 3 lead to spurious results and increased type I error rate. As sample sizes continue to increase, this difficulty becomes even 2 more severe, because larger samples are more likely to include individuals with a different genetic ancestry, or related individuals.Recently, LMMs have emerged as the method of choice for GWAS, due to their robustness to diverse sources of confounding 3 . LMMs gain resilience to confounding by testing for association conditioned on pairwise kinship coefficients between study subjects. Although designed for continuous phenotypes, LMMs have been successfully used in several large case-control GWAS 4-6 , because alternative methods cannot capture diverse sources of confounding 3 .However, LMMs in ascertained case-control studies, wherein cases are oversampled relative to the disease prevalence, lose power with increasing sample size compared to alternative methods 7 . This loss is due to several model violations: Dependence between tested causal variants and variants used to estimate kinship, dependence between genetic and environmental effects, and use of a non-continuous trait (Supplementary Note). Thus, the use of LMMs resolves the difficulty of sensitivity to confounding, but leads to a different difficulty instead.A possible remedy is to test for associations with a model that directly represents the case-control phenotype and takes the ascertainment scheme into account (Supplementary Note). Such models assume that observed case-control phenotypes are generated by an unobserved stochastic process with a well-defined distribution. One prominent example is the liability threshold model 8 , which associates individuals with a latent normally distributed variable called the liability, such that cases are individuals whose liability exceeds a given cutoff. Despite their elegance, such models are extremely computationally expensive, rendering whole genome association tests infeasible in most circumstances.As an alternative, we propose approximating such models by first estimating latent liability values and model param...