Multiple prediction models for risk of in-hospital mortality from COVID-19 have been developed, but not applied, to patient cohorts different to those from which they were derived. The MEDLINE, EMBASE, Scopus, and Web of Science (WOS) databases were searched. Risk of bias and applicability were assessed with PROBAST. Nomograms, whose variables were available in a well-defined cohort of 444 patients from our site, were externally validated. Overall, 71 studies, which derived a clinical prediction rule for mortality outcome from COVID-19, were identified. Predictive variables consisted of combinations of patients′ age, chronic conditions, dyspnea/taquipnea, radiographic chest alteration, and analytical values (LDH, CRP, lymphocytes, D-dimer); and markers of respiratory, renal, liver, and myocardial damage, which were mayor predictors in several nomograms. Twenty-five models could be externally validated. Areas under receiver operator curve (AUROC) in predicting mortality ranged from 0.71 to 1 in derivation cohorts; C-index values ranged from 0.823 to 0.970. Overall, 37/71 models provided very-good-to-outstanding test performance. Externally validated nomograms provided lower predictive performances for mortality in their respective derivation cohorts, with the AUROC being 0.654 to 0.806 (poor to acceptable performance). We can conclude that available nomograms were limited in predicting mortality when applied to different populations from which they were derived.