The contribution of gene-environment (GxE) interactions for many human traits and diseases is poorly characterised. We propose a method, LEMMA, that estimates an interpretable environmental score (ES) that interacts with genetic markers throughout the genome. When applied to body mass index, systolic, diastolic and pulse pressure in the UK Biobank we estimate that 9.3%, 3.9%, 1.6% and 12.5% of phenotypic variance is explained by GxE interactions, and that rare variants explain most of this variance. We also identify 3 loci that interact with the estimated environmental scores (− log 10 p > 7).
IntroductionDespite longstanding interest in gene-by-environment (GxE) interactions 1 , this facet of genetic architecture remains poorly characterised in humans. Detection of GxE interactions is inherently 1 more difficult than detection of additive genetics in genome wide association studies (GWAS).One difficulty is that of sample size; a commonly cited rule of thumb suggests that detection of interaction effects requires a sample size at least four times larger than that required to detect a main effect of comparable effect size 2 . Another is that an individual's environment, which occurs through time, is very hard to measure in a comprehensive way, and is inherently high dimensional.Also, there are many environmental variables that could plausibly interact with the genome and many ways to combine them, and typically these factors were not all present in the same dataset.The recently released UK Biobank 3 dataset, a large population cohort study with deep genotyping and sequencing and extensive phenotyping, offers a unique opportunity to explore GxE effects 4-10 .Models that consider environmental variables jointly can be advantageous, particularly if several environmental variables drive interactions at individual loci or if an unobserved environment that drives interactions is better reflected by a combination of observed environments. StructLMM 7 models the environmental similarity between individuals (over multiple environments) as a random effect, and then tests each SNP independently for GxE interactions, but does not model the genome wide contribution of all the markers, which is often a major component of phenotypic variance.Advances in methods applied to detect genetic main effects in standard GWAS have shown that linear mixed models (LMMs) can reduce false positive associations due to population structure, and improve power by implicitly conditioning on other loci across the genome [11][12][13] . Often these methods model the unobserved polygenic contribution as a multivariate Gaussian with covariance structure proportional to a genetic relationship matrix (GRM) [14][15][16] . This approach is (RHE) regression on UK Biobank scale datasets 22, 23 . This heritability analysis can be run on genotyped or imputed SNPs and stratified by MAF and LD to better interrogate the genetic architecture of GxE interactions. The ES is also used to test for GxE interactions one variant at a time, typically at a larger set of i...