Polygenic scores (PGS) have been widely used to predict complex traits and risk of diseases using variants identified from genome-wide association studies (GWASs). To date, most GWASs have been conducted in populations of European ancestry, which limits the use of GWAS-derived PGS in non-European populations. Here, we develop a new theory to predict the relative accuracy (RA, relative to the accuracy in populations of the same ancestry as the discovery population) of PGS across ancestries. We used simulations and real data from the UK Biobank to evaluate our results. We found across various simulation scenarios that the RA of PGS based on trait-associated SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of SNP effect sizes and heritability. Altogether, we find that LD and MAF differences between ancestries explain alone up to ~70% of the loss of RA using European-based PGS in African ancestry for traits like body mass index and height.
Our results suggest that causal variants underlying common genetic variation identified in European ancestry GWASs are mostly shared across continents.Polygenic scores (PGS, also known as PRS when applied to diseases) are now routinely utilised to predict complex traits and risk of diseases from findings of genome-wide association studies (GWASs). Over recent years, the predictive performances of PGS have steadily increased with GWASs sample sizes, as predicted by theory 1 . However, the over-representation of European ancestry in the majority of GWASs has been shown to yield an unbalanced improvement of PGS prediction accuracy, in particular in non-European ancestry populations 2,3 . For example, Duncan et al. 2 report the average accuracy of PGS across multiple traits to be ~64% lower in individuals of African ancestry as compared to individuals of European ancestry. Similarly, Martin et al. 3 report, across multiple traits, reductions of PGS accuracy relative to European ancestry of ~37%, ~50% and ~78% in individuals of South-Asian, East-Asian and African ancestries respectively. Although increasingly emphasised in the recent GWAS literature, it is worth noting that the "loss of accuracy" problem is not utterly new. Indeed, a number of studies in the animal breeding literature have previously reported lower accuracy of genomic selection across genetically distant breeds 4,5 , consistent with the observation of limited transferability of GWAS findings across diverse human populations 6,7 . These studies also highlight major factors influencing that loss such as differences between populations in causal variants effect sizes, in alleles frequencies and in linkage disequilibrium (LD) between causal variants and SNPs assayed in GWAS. 6,8,9 To illustrate the latter point, consider a SNP which LD r 2 with a causal variant equals 0.8 in the discovery population and 0.6 in the target population. Such a SNP would therefore explain 25%=(1-0.6/0.8) less trait variation and thus be less predictive in t...