BackgroundThe COVID-19 pandemic is impressively challenging the healthcare system. Several prognostic models have been validated but few of them are implemented in daily practice. The objective of the study was to validate a machine-learning risk prediction model using easy-to-obtain parameters, potentially available at home, to help identifying patients with COVID-19 who are at higher risk of death.MethodsThe training cohort included all patients admitted to Fondazione Policlinico Gemelli with COVID-19 from March 5, 2020 to November 5, 2020. Afterwards, the model was tested on all patients admitted to the same hospital with COVID-19 from November 6, 2020 to February 5 2021. The primary outcome was in-hospital mortality.The out-of-sample performance of the model was estimated from the training set in terms of Area under the Receiving Operator Curve (AUROC) and classification matrix statistics by averaging the results of 5-fold cross validation repeated 3-times and comparing the results with those obtained on the test set. An explanation analysis of the model, based on the SHapley Additive exPlanations (SHAP), is also presented. To assess the subsequent time evolution, the change in paO2/FiO2 (P/F) at 48 hours after the baseline measurement was plotted against its baseline value.ResultsAmong the 921 patients included in the training cohort, 120 died (13%). Variables selected for the model were age, platelet count, SpO2, blood urea nitrogen (BUN), hemoglobin, C-reactive protein, neutrophil count, and sodium. The results of the 5-fold cross-validation repeated 3-times gave AUROC of 0.87, and statistics of the classification matrix to the Youden index as follows: sensitivity 0.840, specificity 0.774, negative predictive value 0.971. Then, the model was tested on a new population (n=1463) in which the mortality rate was 22.6 %. The test model showed AUROC 0.818, sensitivity 0.813, specificity 0.650, negative predictive value 0.922. Considering the first quartile of the predicted risk score (low-risk score group), the mortality rate was 1.6%, 17.8% in the second and third quartile (high-risk score group) and 53.5% in the fourth quartile (very high-risk score group). The three risk score groups showed good discrimination for the P/F value at admission, and a positive correlation was found for the low-risk class to P/F at 48 hours after admission (adjusted R-squared= 0.48).ConclusionsWe developed a predictive model of death for people with SARS-CoV-2 infection by including only easy-to-obtain variables (abnormal blood count, BUN, C-reactive protein, sodium and lower SpO2). It demonstrated good accuracy and high power of discrimination. The simplicity of the model makes the risk prediction applicable for patients at home, in the Emergency Department, or during hospitalization.