Objective:The COVID-19 pandemic has posed a heavy burden to the healthcare system worldwide and caused huge social disruption and economic loss. Many deep learning models have been proposed to conduct clinical predictive tasks such as mortality prediction for COVID-19 patients in intensive care units using Electronic Health Record (EHR) data. Despite their initial success in certain clinical applications, there is currently a lack of benchmarking results to achieve a fair comparison so that we can select the optimal model for clinical use. Furthermore, there is a discrepancy between the formulation of traditional prediction tasks and real-world clinical practice in intensive care. Methods: To fill these gaps, we propose two clinical prediction tasks, Outcome-specific length-of-stay prediction and Early mortality prediction for COVID-19 patients in intensive care units. The two tasks are adapted from the naive length-of-stay and mortality prediction tasks to accommodate the clinical practice for COVID-19 patients. We propose fair, detailed, open-source data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models on two tasks, including 5 machine learning models, 6 basic deep learning models and 6 deep learning predictive models specifically designed for EHR data. Results: We provide benchmarking results using data from two real-world COVID-19 EHR datasets. Both datasets are publicly available without needing any inquiry and one dataset can be accessed on request. We provide fair, reproducible benchmarking results for two tasks.
Conclusions:We deploy all experiment results and models on an online platform. We also allow clinicians and researchers to upload their data to the platform and get quick prediction results using our trained models. We hope our efforts can further facilitate deep learning and machine learning research for COVID-19 predictive modeling. Software Repository: https://github.com/yhzhu99/covid-ehr-benchmarks
IntroductionThe COVID-19 pandemic needs no introduction. As of May 2022, the virus has caused over 500 million infected cases and over 6 million deaths 1 . Though research shows that new variants of COVID-19 are less deadly, they are more spreadable and cause the number of cases still surging globally 2 . Under current circumstances, achieving early risk prediction and estimating the disease progression especially for COVID-19 patients in intensive care units have been an important topic to allocate limited medical resources and relieve the burdens of our healthcare system.Electronic health record (EHR) data and intelligent models have been viable solutions to solve this challenge. Many machine learning and deep learning models have been proposed to utilize COVID-19 patients' EHR data to conduct clinical prediction tasks including severity 3-12 , diagnosis 13, 14 , length-of-stay (LOS) 15, 16 , etc. There are more previous general EHR predictive models, which can also be applied to COVID-19 prediction tasks. These works achieve better prediction performances compared with ...