[Background]
Heart failure is a major cause of death globally and earlier initiation of treatment could mitigate disease progression. Multiple efforts have been made using genome-wide association studies (GWAS) or electronic health records (EHR) to identify individuals at high risk of heart failure (HF). However, integrating both sources using novel natural language processing (NLP) techniques and large scale global genetic predictors into heart failure prediction models has not been evaluated.
[Objectives]
The study aimed to improve the accuracy of HF prediction by integrating GWAS- and EHR-derived risk scores.
[Methods]
We previously performed the largest HF GWAS to date within the Global Biobank Meta-analysis Initiative, which includes 974,174 samples (51,274 cases; 5%) from 9 biobanks across the world, to create a polygenic risk score (PRS). Next, to extract information from the Michigan Medicine high-dimensional EHR (N=61,849 subjects), we treated diagnosis codes as words and applied NLP on the data. NLP was used to learn code co-occurrence patterns and extract 350 latent phenotypes (low-dimensional features) representing 29,346 EHR codes. Next, we regressed HF on the latent phenotypes in an independent cohort and the coefficients were used as the weights to calculate a clinical risk score (ClinRS). Model performances were compared between baseline (age and sex) model and three models with risk scores added: 1) PRS, 2) ClinRS, and 3) PRS+ClinRS, using 10-fold cross validated Area Under the Receiver Operating Characteristic Curve (AUC).
[Results]
Our results show that PRS and ClinRS are each, separately, able to predict HF outcomes significantly better than the baseline model, up to eight years prior to HF diagnosis. Higher AUC (95% CI) were observed in the PRS model (0.76 [0.74-0.78]) and ClinRS model (0.77 [0.74-0.79]), compared to the baseline model (0.71 [0.68-0.73]). Moreover, by including both PRS and ClinRS in the model, we achieved superior performance in predicting HF up to ten years prior to HF diagnosis (AUC: 0.79 [0.77-0.82]), 2-3 years earlier than using either single risk predictor alone.
[Conclusions]
We demonstrate the additive power of integrating GWAS- and EHR-derived risk scores to predict HF cases prior to diagnosis. Clinical application of this approach may allow identification of patients with higher susceptibility to HF and enable preventive therapies to be initiated at an earlier stage.