The vast majority of genome-wide association studies (GWASs) are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g., linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWASs, we used published summary statistics to calculate polygenic risk scores for eight well-studied phenotypes. We identify directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk are typically highest in the population from which summary statistics were derived. We demonstrate that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants and that biases in any direction are possible and unpredictable. This work cautions that summarizing findings from large-scale GWASs may have limited portability to other populations using standard approaches and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry 1-3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific 4-10. Additionally, effect sizes and their derived risk prediction scores derived in one population may Reprints and permissions information is available at http://www.nature.com/reprints.
Background
Hepatitis C virus (HCV) infections occur worldwide and either spontaneously resolve or persist and markedly increase the person’s lifetime risk of cirrhosis and hepatocellular carcinoma. Although HCV persistence occurs more often in persons of African ancestry and in persons with a genetic variant near IL28B, the genetic basis is not well understood.
Objective
To evaluate the host genetic basis for spontaneous resolution of HCV infection.
Design
Two-stage genome wide association study (GWAS).
Setting
13 international multicenter study sites.
Patients
919 individuals with serum HCV antibodies but no HCV RNA (spontaneous resolution) and 1482 individuals with serum HCV antibodies and RNA (persistence).
Measurements
Frequencies of 792,721 SNPs.
Results
Differences in allele frequencies between persons with spontaneous resolution and persistence were identified on chromosomes 19q13.13 and 6p21.32. On chromosome 19, allele frequency differences localized near IL28B and included rs12979860 (overall per-allele OR = 0.45, P = 2.17 × 10−30) and 10 additional SNPs spanning 55,000 bases. On chromosome 6, allele frequency differences localized near genes for class II human leukocyte antigens (HLA) and included rs4273729 (overall per-allele OR= 0.59, P = 1.71 × 10−16) near DQB1*03:01 and an additional 116 SNPs spanning 1,090,000 base pairs. The associations in chromosomes 19 and 6 were independent, additive, and explain an estimated 14.9% (95% CI: 8.5–22.6%) of the variation in HCV resolution in those of European-Ancestry, and 15.8% (95% CI:4.4–31.0%) in individuals of African-Ancestry. Replication of the chromosome 6 SNP, rs4272729 in an additional 746 individuals confirmed the findings (p=0.015).
Limitations
Epigenetic effects were not studied.
Conclusions
IL28B and HLA class II are independently associated with spontaneous resolution of HCV infection and SNPs marking IL28B and DQB1*03:01 may explain ~15% of spontaneous resolution of HCV infection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.