The UK Biobank Pharma Proteomics Project (UKB-PPP) is a collaboration between the UK Biobank (UKB) and thirteen biopharmaceutical companies characterising the plasma proteomic profiles of 54,306 UKB participants. Here, we describe results from the first phase of UKB-PPP, including protein quantitative trait loci (pQTL) mapping of 1,463 proteins that identifies 10,248 primary genetic associations, of which 85% are newly discovered. We also identify independent secondary associations in 92% of cis and 29% of trans loci, expanding the catalogue of genetic instruments for downstream analyses. The study provides an updated characterisation of the genetic architecture of the plasma proteome, leveraging population-scale proteomics to provide novel, extensive insights into trans pQTLs across multiple biological domains. We highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement proteins, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug target discovery by extending the genetic proxied effect of PCSK9 levels on lipid concentrations, cardio- and cerebro-vascular diseases, and additionally disentangle specific genes and proteins perturbed at COVID-19 susceptibility loci. This public-private partnership provides the scientific community with an open-access proteomics resource of unprecedented breadth and depth to help elucidate biological mechanisms underlying genetic discoveries and accelerate the development of novel biomarkers and therapeutics.
BackgroundChronic sputum production impacts on quality of life and is a feature of many respiratory diseases. Identification of the genetic variants associated with chronic sputum production in a disease agnostic sample could improve understanding of its causes and identify new molecular targets for treatment.MethodsWe conducted a genome-wide association study (GWAS) of chronic sputum production in UK Biobank. Signals meeting genome-wide significance (p<5×10−8) were investigated in additional independent studies, were fine-mapped, and putative causal genes identified by gene expression analysis. GWAS of respiratory traits were interrogated to identify whether the signals were driven by existing respiratory disease amongst the cases and variants were further investigated for wider pleiotropic effects using phenome-wide association studies (PheWAS).FindingsFrom a GWAS of 9714 cases and 48 471 controls, we identified six novel genome-wide significant signals for chronic sputum production including signals in the Human Leukocyte Antigen (HLA) locus, chromosome 11 mucin locus (containingMUC2,MUC5ACandMUC5B) and theFUT2locus. The four common variant associations were supported by independent studies with a combined sample size of up to 2,203 cases and 17,627 controls. The mucin locus signal had previously been reported for association with moderate-to-severe asthma. The HLA signal was fine-mapped to an amino-acid change of threonine to arginine (frequency 36.8%) in HLA-DRB1 (HLA-DRB1*03:147). The signal nearFUT2was associated with expression of several genes includingFUT2,for which the direction of effect was tissue dependent. Our PheWAS identified a wide range of associations including blood cell traits, liver biomarkers, infections, gastrointestinal and thyroid-associated diseases, and respiratory disease.InterpretationNovel signals at theFUT2and mucin loci suggest that mucin fucosylation may be a driver of chronic sputum production even in the absence of diagnosed respiratory disease and provide genetic support for this pathway as a target for therapeutic intervention.
Genome-wide association studies (GWAS) have significantly advanced our understanding of the genetic underpinnings of diseases, but case and control cohort definitions for a given disease can vary between different published studies. For example, two GWAS for the same disease using the UK Biobank data set might use different data sources (i.e., self-reported questionnaires, hospital records, etc.) or different levels of granularity (i.e., specificity of inclusion criteria) to define cases and controls. The extent to which this variability in cohort definitions impacts the end-results of a GWAS study is unclear. In this study, we systematically evaluated the effect of the data sources used for case and control definitions on GWAS findings. Using the UK Biobank, we selected three diseases-glaucoma, migraine, and iron-deficiency anemia. For each disease, we designed 13 GWAS, each using different combinations of data sources to define cases and controls, and then calculated the pairwise genetic correlations between all GWAS for each disease. We found that the data sources used to define cases for a given disease can have a significant impact on GWAS end-results, but the extent of this depends heavily on the disease in question. This suggests the need for greater scrutiny on how case cohorts are defined for GWAS.
Even modest improvements in the probability of success of selecting drug targets which are ultimately approved can substantially reduce the costs of research and development. Drug targets with human genetic evidence of disease association are twice as likely to lead to approved drugs. A key enabler of identifying and validating these genetically validated targets is access to association results from genome-wide genotyping, whole-exome sequencing, and whole-genome sequencing studies with observable traits (often diseases) across large numbers of individuals. Today, linkage between genotype and real-world data (RWD) provides significant opportunities to not only increase the statistical power of genome-wide association studies by ascertaining additional cases for diseases of interest, but also to improve diversity and coverage of association studies across the disease phenome. As RWD-genetics linked resources continue to grow in diversity of participants, breadth of data captured, length of observation, and number of participants, there is a greater need to leverage the experience of RWD experts, clinicians, and highly experienced geneticists together to understand which lessons and frameworks from general research using RWD sources are relevant to improve genetics-driven drug discovery and development. This paper describes new challenges and opportunities for phenotypes enabled by diverse RWD sources, considerations in the use of RWD phenotypes for disease gene identification across the disease phenome, and challenges and opportunities in leveraging RWD phenotypes in target validation. The paper concludes with views on the future directions for phenotype development using RWD, and key questions requiring further research and development to advance this nascent field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.