Objective: Electronic health records (EHRs) are a resource for "big data" analytics, containing a variety of data elements. We investigate how different categories of information contribute to prediction of mortality over different time horizons among patients undergoing hemodialysis treatment. Material and Methods: We derived prediction models for mortality over 7 time horizons using EHR data on older patients from a national chain of dialysis clinics linked with administrative data using LASSO (least absolute shrinkage and selection operator) regression. We assessed how different categories of information relate to risk assessment and compared discrete models to time-to-event models. Results: The best predictors used all the available data (c-statistic ranged from 0.72-0.76), with stronger models in the near term. While different variable groups showed different utility, exclusion of any particular group did not lead to a meaningfully different risk assessment. Discrete time models performed better than time-to-event models. Conclusions: Different variable groups were predictive over different time horizons, with vital signs most predictive for near-term mortality and demographic and comorbidities more important in long-term mortality.Key words: Electronic Health Records, hemodialysis, ESRD, predictive modeling
OBJECTIVEThe increasing availability of electronic health records (EHRs) offers vast and unique opportunities for biomedical research, at the core of which are predictive modeling and analytics. The presence of diverse data elements allows for the construction of prediction models using a wealth of predictors that may not all be available in more standard settings. Depending on the EHR and other linkable sources, it is generally possible to ascertain information on patients' demographics, health service utilization, diagnosed comorbidities, prescribed medications, results from laboratory tests, and vital signs. In total, these represent a source of "big data" analytics in clinical research. Owing to these variable data elements, EHRs present the opportunity to develop risk models over a range of time horizons. Studies have predicted events from within the next 12 hours 1 to up to 8 years.
2Examples include predicting the risk of adverse outcomes, including administrative events such as hospital readmission, 3 and discrete clinical events such acute kidney injury. 4 As such, each study has abstracted different pieces of information from its respective EHR.-With this in mind, we sought to explore the role EHR data elements play in predicting mortality over different time horizons. This could inform future model development efforts by prioritizing limited resources toward collecting the predictors that provide the most useful