Objective
To improve the performance of a social risk score (a predictive risk model) using electronic health record (EHR) structured and unstructured data.
Materials and Methods
We used EPIC-based EHR data from July 2016 to June 2021 and linked it to community-level data from the US Census American Community Survey. We identified predictors of interest within the EHR structured data and applied natural language processing (NLP) techniques to identify patients’ social needs in the EHR unstructured data. We performed logistic regression models with and without information from the unstructured data (Models I and II) and compared their performance with generalized estimating equation (GEE) models with and without the unstructured data (Models III and IV).
Results
The logistic model (Model I) performed well (Area Under the Curve [AUC] 0.703, 95% confidence interval [CI] 0.701:0.705) and the addition of EHR unstructured data (Model II) resulted in a slight change in the AUC (0.701, 95% CI 0.699:0.703). In the logistic models, the addition of EHR unstructured data resulted in an increase in the area under the precision-recall curve (PRC 0.255, 95% CI 0.254:0.256 in Model I versus 0.378, 95% CI 0.375:0.38 in Model II). The GEE models performed similarly to the logistic models and the addition of EHR unstructured data resulted in a slight change in the AUC (0.702, 95% CI 0.699:0.705 in Model III versus 0.699, 95% CI 0.698:0.702 in Model IV).
Discussion
Our work presents the enhancement of a novel social risk score that integrates community-level data with patient-level data to systematically identify patients at increased risk of having future social needs for in-depth assessment of their social needs and potential referral to community-based organizations to address these needs.
Conclusion
The addition of information on social needs extracted from unstructured EHR resulted in an improved prediction of positive cases presented by the improvement in the PRC.