38Admixed populations are routinely excluded from medical genomic studies due to concerns over 39 population structure. Here, we present a statistical framework and software package, Tractor, to facilitate the 40 inclusion of admixed individuals in association studies by leveraging local ancestry. We test Tractor with 41 simulations and empirical data focused on admixed African-European individuals. Tractor generates ancestry-42 specific effect size estimates, can boost GWAS power, and improves the resolution of association signals.
43Using a local ancestry aware regression model, we replicate known hits for blood lipids in admixed 44 populations, discover novel hits missed by standard GWAS procedures, and localize signals closer to putative 45 causal variants. 46 47 48 Introduction 49Admixed groups, whose genomes contain more than one ancestral population such as African
50American and Hispanic/Latino individuals, make up more than a third of the US populace, and the population 51 is becoming increasingly mixed over time 1 . Many common, heritable, diseases including prostate cancer 2-5 , 52 asthma 6-9 , and several cardiovascular disorders such as atherosclerosis 10,11 are enriched in admixed 53 populations of the US. However, only a minute proportion of association studies address the genetic 54 architecture of complex traits in such groups 12,13 ; admixed individuals are systematically removed from many 55 studies due to the lack of methods and pipelines to effectively account for their ancestry such that population 56 substructure can infiltrate analyses and bias results [14][15][16][17][18][19][20][21] . Large-scale efforts to collect genetic data alongside 57 medically-relevant phenotypes are beginning to focus more on non-Eurasian ethnic groups that contain higher 58 amounts of admixture 22-27 , motivating the timely development of scalable methods to allow well-calibrated 59 statistical genomic work on these populations. If not addressed, this inability to analyze admixed people will 60 limit the clinical utility of large-scale data-collection efforts for minorities, exacerbating the concerning health 61 disparities that already exist 28-32 .
62In GWAS, the specific concern regarding including admixed participants is obtaining false positive hits due 63 to alleles being at different frequencies across populations. Most studies currently attempt to control for this by 64 using Principle Components (PCs) in a linear or linear mixed model framework. However, PCs capture broader 65 admixture fractions, and individuals' local ancestry makeup may differ between case and control cohorts even 66 if their global fractions are identical. Even including PCs as covariates, then, still leaves open the possibility for 67 false positive associations, as well as absorbing power.
68Studying diverse populations in gene discovery efforts not only reduces disparities but also benefits 69 genetic analysis for individuals of all ancestries. Perhaps the most notable example of this is in multi-ethnic 70 fine-mapping, which ca...