5Genome-wide association studies have been effective at revealing the genetic architecture of 6 simple traits. Extending this approach to more complex phenotypes has necessitated a massive 7 increase in cohort size. To achieve sufficient power, participants are recruited across multiple 8 collaborating institutions, leaving researchers with two choices: either collect all the raw data 9 at a single institution or rely on meta-analyses to test for association. In this work, we present 10 a third alternative. Here, we implement an entire GWAS workflow (quality control, population 11 structure control, and association) in a fully decentralized setting. Our iterative approach (a) 12 does not rely on consolidating the raw data at a single coordination center, and (b) does not 13 hinge upon large sample size assumptions at each silo. As we show, our approach overcomes 14 challenges faced by meta-studies when it comes to associating rare alleles and when case/control 15 proportions are wildly imbalanced at each silo. We demonstrate the feasibility of our method in 16 cohorts ranging in size from 2K (small) to 500K (large), and recruited across 2 to 10 collaborating 17 institutions. 18 1 Under Preparation Introduction 19Genome wide association studies (GWAS) are a popular approach to elucidate genetic architecture of 20 human phenotypes. This design has led to the discovery of many novel loci underpinning a panoply 21 of human traits (see Visscher et. al. for a recent review (1)). For traits driven by few variants with 22 large effects, moderately sized cohorts have been sufficient to power discovery. However, the GWAS 23 framework demands increasingly larger cohort sizes as the complexity of the trait grows. To achieve 24 required statistical power today, large, multi-institutional consortia are assembled under a common 25 data sharing agreement. 3; 4) or a combination of meta-and mega-analysis 26 (centralized analysis) (5; 6; 7; 8) constitute two major approaches to conducting GWAS. 27 Each approach offers merits and shortcomings. For mega-analysis, collecting all the data at every 28 analysis core is not only expensive and time consuming but also creates a security vulnerability at 29 each institution that hosts a copy of the data. Conversely, the meta-analysis approach eliminates 30 the need for data replication, but is more limited in flexibility. In particular, (a) subtle differences in 31 models, assumptions, and quality control (QC) can introduce biases in the results (9), (b) the shared 32 data and summary statistics might be inadequate for some types of inference (e.g. individual level 33 population structure control, conditional or joint analysis, etc.), and (c) parameter estimates can be 34 unreliable for rare variants or from centers contributing small sample sizes because the asymptotic 35 properties of maximum likelihood estimation theory may not hold (10).
36In this manuscript we develop a method that interpolates between centralized and meta-analysis 37 methods. Like meta-studies, our paradigm...