48Genetic association results are often interpreted with the assumption that study participation 49 does not affect downstream analyses. Understanding the genetic basis of this participation bias 50 is challenging as it requires the genotypes of unseen individuals. However, we demonstrate that 51 it is possible to estimate comparative biases by performing GWAS contrasting one subgroup 52 versus another. For example, we show that sex exhibits autosomal heritability in the presence of 53 sex-differential participation bias. By performing a GWAS of sex in ~3.3 million males and 54 females, we identify over 150 autosomal loci significantly associated with sex and highlight 55 complex traits underpinning differences in study participation between sexes. For example, the 56 body mass index (BMI) increasing allele at the FTO locus was observed at higher frequency in 57 males compared to females (OR 1.02 [1.02-1.03], P=4.4x10 -36 ). Finally, we demonstrate how 58 these biases can potentially lead to incorrect inferences in downstream analyses and propose a 59 conceptual framework for addressing such biases. Our findings highlight a new challenge that 60 genetic studies may face as sample sizes continue to grow. 61 62 63 64 65 66Individuals who enroll in research studies or purchase direct-to-consumer genetic tests are often 67 not representative of the general population 1,2,3 . 68For example, the UK Biobank study invited ~9 million individuals and achieved an overall 69 participation rate of 5.45% 4 . These enrolled individuals clearly demonstrated a "healthy 70 volunteer bias", with lower rates of obesity, smoking and fewer self-reported health conditions 71 than the sampling frame 4 . Achieving accurate representation of the sampling population in any 72 study is challenging. Examples do exist, however, such as the iPSYCH study which enrolled a 73 random sample of the population, based on DNA extracted from a nationwide collection of 74 neonatal dried blood spots 5 . The benefits of achieving such representativeness have long been 75 discussed 6,7,8,9 , with many arguing that unrepresentative samples can bias prevalence estimates 76 but do not necessarily create substantial biases on exposure-disease associations 10,11 . 77Purposely non-representative study designs can also be valuable, for example case-control 78 studies seeking to enrich cases with non-genetic risk factors can maximize power to detect 79 genetic effects 12 . 80 81Recent studies have highlighted that genetic factors are associated with aspects of study 82 engagement 13,14,15 . For example, individuals with high genetic risk for schizophrenia enrolled in 83 a study are less likely to complete health questionnaires, attend clinical assessments and 84 continue participation in longitudinal studies than those with lower genetic risk 13,16 . It remains 85 unclear to what extent genetic factors influence initial study participation, or what the 86 downstream consequences of such bias are, though there are prior attempts to quantify the bias 87 with simul...