Background: No single multimorbidity measure is validated for use in NHS England's General Practice Extraction Service Data for Pandemic Planning and Research (GDPPR), the nationwide primary care dataset created for coronavirus disease 19 (COVID-19) pandemic research. A single morbidity measure is advantageous when there is a need to adjust for multimorbidity, such as modelling the effectiveness of vaccinations against COVID-19, as including multiple individual morbidities is challenging. The Cambridge Multimorbidity Score (CMMS) is a validated tool for predicting mortality risk. However, the number of Systematised Nomenclature of Medicine clinical terms (SNOMED CT) for the GDPPR dataset is limited and does not define all the conditions used to calculate the CMMS Objective: To develop and validate a modified version of CMMS using the clinical terms available for the GDPPR.
Methods:We used pseudonymised data from the Oxford-Royal College of General Practitioners Research and Surveillance Centre (RCGP RSC), which has a more extensive SNOMED CT list. From the 37 conditions used in the original CMMS model, we selected conditions either with: (a) high prevalence ratio (? 85%), calculated as the prevalence in the RSC data set as defined by the GDPPR set of SNOMED CT codes, divided by the prevalence as defined by the RSC set of SNOMED CT codes, or (b) conditions with lower prevalence ratio but with high predictive value. The resulting set of conditions was included in Cox proportional hazard models to determine the 1-year mortality risk in a development dataset (n=300,000) and construct a new CMMS model, following the original CMMS, with variable reduction and parsimony, achieved by backward elimination and Akaike information stopping criterion. Model validation involved obtaining 1-year mortality estimates for a synchronous dataset (n=150,000) and 1-year and 5-year mortality estimates for an asynchronous dataset (n=150,000).Results: The initial model contained 22 conditions and our final model included 17 conditions. The conditions overlapped with those of a modified CMMS, which we previously developed using RSC data and the more extensive RSC SNOMED CT list. For 1-year mortality, discrimination was high in both the derivation and validation datasets (Harrell's C=0.92), and 5-year mortality was slightly lower (Harrell's C= 0.90), and the calibration was reasonable following an adjustment for over-fitting. The performance was similar to that of both the original and previous modified CMMS models.