Whole genome sequencing analysis of the cardiometabolic proteome

Gilly, Arthur; Park, Young‐Chan; Png, Grace; Barysenka, Andrei; Fischer, Iris; Bjørnland, Thea; Southam, Lorraine; Süveges, Dániel; Neumeyer, Sonja; Rayner, Nigel W.; Tsafantakis, Emmanouil; Karaleftheri, Maria; Dedoussis, George; Zeggini, Eleftheria

doi:10.1101/854752

Cited by 5 publications

(7 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To evaluate the effect of diverse ancestry, we obtained statistics for a single study of 15 hematological traits from metaanalysis of five global populations [23]. To evaluate the effect of direct sequencing vs genotype imputation, we obtained statistics from two additional studies reporting WGS derived summary statistics [24,25]. We selected significant regions, merged with GTEx Whole Blood summary statistics, ran coloc and POEMColoc as described above for the UKBB summary statistics.…”

Section: Evaluating Poemcoloc On Gwas Hits and Gtex Eqtlmentioning

confidence: 99%

Estimating colocalization probability from limited summary statistics

et al. 2021

View full text Add to dashboard Cite

Background Colocalization is a statistical method used in genetics to determine whether the same variant is causal for multiple phenotypes, for example, complex traits and gene expression. It provides stronger mechanistic evidence than shared significance, which can be produced through separate causal variants in linkage disequilibrium. Current colocalization methods require full summary statistics for both traits, limiting their use with the majority of reported GWAS associations (e.g. GWAS Catalog). We propose a new approximation to the popular coloc method that can be applied when limited summary statistics are available. Our method (POint EstiMation of Colocalization, POEMColoc) imputes missing summary statistics for one or both traits using LD structure in a reference panel, and performs colocalization using the imputed summary statistics. Results We evaluate the performance of POEMColoc using real (UK Biobank phenotypes and GTEx eQTL) and simulated datasets. We show good correlation between posterior probabilities of colocalization computed from imputed and observed datasets and similar accuracy in simulation. We evaluate scenarios that might reduce performance and show that multiple independent causal variants in a region and imputation from a limited subset of typed variants have a larger effect while mismatched ancestry in the reference panel has a modest effect. Further, we find that POEMColoc is a better approximation of coloc when the imputed association statistics are from a well powered study (e.g., relatively larger sample size or effect size). Applying POEMColoc to estimate colocalization of GWAS Catalog entries and GTEx eQTL, we find evidence for colocalization of 150,000 trait-gene-tissue triplets. Conclusions We find that colocalization analysis performed with full summary statistics can be closely approximated when only the summary statistics of the top SNP are available for one or both traits. When applied to the full GWAS Catalog and GTEx eQTL, we find that colocalized trait-gene pairs are enriched in tissues relevant to disease etiology and for matches to approved drug mechanisms. POEMColoc R package is available at https://github.com/AbbVie-ComputationalGenomics/POEMColoc.

show abstract

Section: Evaluating Poemcoloc On Gwas Hits and Gtex Eqtlmentioning

confidence: 99%

Estimating colocalization probability from limited summary statistics

et al. 2021

View full text Add to dashboard Cite

show abstract

“…As samples are easy to store, collection is minimally invasive for study participants, and hundreds to thousands of molecules can be measured, plasma proteins have been investigated as biomarkers for numerous diseases 1 . The recent advances in targeted proteomics technologies have allowed thousands of circulating plasma protein levels to be measured simultaneously, even in large sample sizes [2][3][4][5][6][7][8][9] . Uncovering relationships between protein biomarkers and disease has the potential to aid in prediction of risk, diagnosis and development of new therapies for disease 10 .…”

Section: Introductionmentioning

confidence: 99%

“…Previous large GWAS of plasma protein levels have discovered hundreds of associated loci, uncovered mechanisms for pQTL, causal relationships between proteins and diseases and posited how plasma protein levels may act to influence disease risk 3,4,7,8,17,18,20 . In order to maximise the potential for pQTL discovery and MR to find causal associations with disease and build on previous work, we performed genome-wide meta-analysis with the largest sample size for 184 cardiovascular-related plasma proteins.…”

Section: Introductionmentioning

confidence: 99%

Mapping genetic determinants of 184 circulating proteins in 26,494 individuals to connect proteins and diseases

Macdonald-Dunlop

Klarić

Folkersen³

et al. 2021

Preprint

View full text Add to dashboard Cite

We performed the largest genome-wide meta-analysis (GWAMA) (Max N=26,494) of the levels of 184 cardiovascular-related plasma protein levels to date and reported 592 independent loci (pQTL) associated with the level of at least one protein (1308 significant associations, median 6 per protein). We estimated that only between 8-37% of testable pQTL overlap with established expression quantitative trait loci (eQTL) using multiple methods, while 132 out of 1064 lead variants show evidence for transcription factor binding, and found that 75% of our pQTL are known DNA methylation QTL. We highlight the variation in genetic architecture between proteins and that proteins share genetic architecture with cardiometabolic complex traits. Using cis-instrument Mendelian randomisation (MR), we infer causal relationships for 11 proteins, recapitulating the previously reported relationship between PCSK9 and LDL cholesterol, replicating previous pQTL MR findings and discovering 16 causal relationships between protein levels and disease. Our MR results highlight IL2-RA as a candidate for drug repurposing for Crohn’s Disease as well as 2 novel therapeutic targets: IL-27 (Crohn’s disease) and TNFRSF14 (Inflammatory bowel disease, Multiple sclerosis and Ulcerative colitis). We have demonstrated the discoveries possible using our pQTL and highlight the potential of this work as a resource for genetic epidemiology.

show abstract

“…An important issue linked to blood analysis is the underlying effect of genetics to determine stable differences in protein levels between individuals. The levels of blood proteins have previously been determined to be influenced both by genetic and environmental factors, as studied by mass spectrometry-based proteomics [1][2][3][4], nucleic-acid based assays [5][6][7][8], and immuno-based assays [9][10][11][12][13][14]. Effects based on sex [15], specific diets [15], age [16], and infections [17] have also been reported suggesting an important role for quantitative blood protein assays for individualized diagnosis of health and disease.…”

Section: Introductionmentioning

confidence: 99%

“…Suhre et al [8] analyzed the associations between protein levels and gene variants in a German cohort using SOMAscan platform and Affymetrix Array and identified 57 genetic risk loci for 42 disease end points. The PEA platform has also been used for genetic association studies, such as the identification of 16 pQTLs associated with known biomarkers [9], 79 loci for plasma protein biomarkers in cardiovascular disease [10], 8 cis-pQTL in the InCHIANTI study [11], 41 loci for the plasma levels of neurological proteins [12], and 131 independent sequence variant associations of the cardiometabolic proteome [13]. In addition, Yao et al [14] analyzed the association of protein levels and genetic factors for 16,000 pQTL variants in more than 6000 individuals in the Framingham Heart Study using Luminex multiplex immunoassays and identified 13 proteins harboring pQTL variants that match coronary disease-risk variants from GWAS.…”

Section: Introductionmentioning

confidence: 99%

Whole-genome sequence association analysis of blood proteins in a longitudinal wellness cohort

et al. 2020

View full text Add to dashboard Cite

Background: The human plasma proteome is important for many biological processes and targets for diagnostics and therapy. It is therefore of great interest to understand the interplay of genetic and environmental factors to determine the specific protein levels in individuals and to gain a deeper insight of the importance of genetic architecture related to the individual variability of plasma levels of proteins during adult life. Methods: We have combined whole-genome sequencing, multiplex plasma protein profiling, and extensive clinical phenotyping in a longitudinal 2-year wellness study of 101 healthy individuals with repeated sampling. Analyses of genetic and non-genetic associations related to the variability of blood levels of proteins in these individuals were performed. Results: The analyses showed that each individual has a unique protein profile, and we report on the intraindividual as well as inter-individual variation for 794 plasma proteins. A genome-wide association study (GWAS) using 7.3 million genetic variants identified by whole-genome sequencing revealed 144 independent variants across 107 proteins that showed strong association (P < 6 × 10 −11) between genetics and the inter-individual variability on protein levels. Many proteins not reported before were identified (67 out of 107) with individual plasma level affected by genetics. Our longitudinal analysis further demonstrates that these levels are stable during the 2-year study period. The variability of protein profiles as a consequence of environmental factors was also analyzed with focus on the effects of weight loss and infections. Conclusions: We show that the adult blood levels of many proteins are determined at birth by genetics, which is important for efforts aimed to understand the relationship between plasma proteome profiles and human biology and disease.

show abstract

Whole genome sequencing analysis of the cardiometabolic proteome

Cited by 5 publications

References 52 publications

Estimating colocalization probability from limited summary statistics

Estimating colocalization probability from limited summary statistics

Mapping genetic determinants of 184 circulating proteins in 26,494 individuals to connect proteins and diseases

Whole-genome sequence association analysis of blood proteins in a longitudinal wellness cohort

Contact Info

Product

Resources

About