Polygenic scores (PGS) have limited portability across different groupings of individuals (e.g., by genetic ancestries and/or social determinants of health), preventing their equitable use. PGS portability has typically been assessed using a single aggregate population-level statistic (e.g., R2), ignoring inter-individual variation within the population. Here we evaluate PGS accuracy at individual-level resolution, independent of its annotated genetic ancestries. We show that PGS accuracy varies between individuals across the genetic ancestry continuum in all ancestries, even within traditionally "homogeneous" genetic ancestry clusters. Using a large and diverse Los Angeles biobank (ATLAS, N= 36,778) along with the UK Biobank (UKBB, N= 487,409), we show that PGS accuracy decreases along a continuum of genetic ancestries in all considered populations and the trend is well-captured by a continuous measure of genetic distance (GD) from the PGS training data; Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained in UKBB "white British" individuals to European-ancestry individuals of ATLAS, individuals in the highest GD decile have 14% lower accuracy relative to the lowest decile; notably the lowest GD decile of Hispanic/Latino American ancestry individuals showed similar PGS performance as the highest GD decile of European ancestry ATLAS individuals. GD is significantly correlated with PGS estimates themselves for 82 out of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestry in PGS interpretation. Our results highlight the need for moving away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGS and their applications.
One of the key questions regarding COVID19 vaccines is whether they can reduce viral shedding. To date, Israel vaccinated substantial parts of the adult population, which enables extracting real world signals. The vaccination rollout started on Dec 20th 2020, utilized mainly the BNT162b2 vaccine, and focused on individuals who are 60 years or older. By now, more than 75% of the individuals of this age group have been at least 14 days after the first dose, compared to 25% of the individuals between ages 40-60 years old. Here, we traced the Ct value distribution of 16,297 positive qPCR tests in our lab between Dec 1st to Jan 31st that came from these two age groups. As we do not have access to the vaccine status of each test, our hypothesis was that if vaccines reduce viral load, we should see a difference in the Ct values between these two age groups in late January but not before. Consistent with this hypothesis, until Jan 15th, we did not find any statistically significant differences in the average Ct value between the groups. In stark contrast, our results in the last two weeks of January show a significant weakening in the average Ct value of 60+ individuals to the 40-60 group. To further corroborate these results, we also used a series nested linear models to explain the Ct values of the positive tests. This analysis favored a model that included an interaction between age and the late January time period, consistent with the effect of vaccination. We then used demographic data and the daily vaccination rates to estimate the effect of vaccination on viral load reduction. Our estimate suggests that vaccination reduces the viral load by 1.6x to 20x in individuals who are positive for SARS-CoV-2. This estimate might improve after more individuals receive the second dose. Taken together, our findings indicate vaccination is not only important for individual's protection but can reduce transmission.
Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1–3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled ‘homogeneous’ genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of −0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.
Finding familial relatives using DNA have multiple applications, in genetic genealogy, population genetics, and forensics. So far, most relative matching algorithms rely on detecting identity-by-descent (IBD) segments with high quality genotype data. Recently, low coverage sequencing (LCS) has received growing attention as a promising cost-effective method to ascertain genomic information. However, with higher error rates, it is unclear whether existing IBD detection can work on LCS datasets. Here, we developed and tested a framework for relative matching using sequencing with 1× coverage (1×LCS). We started by exploring the error characteristics of this method compared to array data. Our results show that after some optimization 1×LCS can exhibit the same genotyping discordance rates as the discordance between two array platforms. Using this observation, we developed a hybrid framework for relative matching and tuned this framework with >2,700 pairs of confirmed genealogical relatives that were genotyped using heterogenous datasets. We then obtained array and 1×LCS on 19 samples and use our framework to find relatives in a database of over 3 million individuals. The total length of shared segments obtained by 1×LCS was virtually indistinguishable to genotyping arrays for matches with a total sharing >200cM (second cousins or closer). For more distant relatives, as long as those were detected by both technologies, the total length obtained by LCS and by genotyping arrays was highly correlated, with no evidence of over- or underestimation. Taken together, our results show that 1×LCS can be a valid alternative to arrays for relative matching, opening the possibility for further democratization of genomic data.
An individual's disease risk is affected by the populations that they belong to, due to shared genetics and shared environment. The study of fine-scale populations in clinical care will be important for reducing health disparities and for developing personalized treatments. In this work, we developed a novel health monitoring system, which leverages biobank data and electronic medical records from over 40,000 UCLA patients. Using identity by descent (IBD), we analyzed one type of fine-scale population, an IBD cluster. In total, we identified 376 IBD clusters, including clusters characterized by the presence of many significantly understudied communities, such as Lebanese Christians, Iranian Jews, Armenians, and Gujaratis. Our analyses identified thousands of novel associations between IBD clusters and clinical diagnoses, physician offices, utilization of specific medical specialties, pathogenic allele frequencies, and changes in diagnosis frequency over time. To enhance the impact of the research and engage the broader community, we provide a web portal to query our results: www.ibd.la
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.