Objective
To describe COVID-19 subphenotypes regarding severity patterns including prognostic, ICU and morbimortality outcomes, through stratification based on gender and age groups, as described by inter-patient variability patterns in clinical phenotypes and demographic features.
Materials and methods
We used the COVID-19 open data from the Mexican Government including patient-level epidemiological and clinical data from 778 692 SARS-CoV-2 patients from January 13, 2020 to September 30, 2020. Inter-patient variability was analyzed by combining dimensionality reduction and hierarchical clustering methods. We produced cluster analyses for all combinations of gender and age groups (<18, 18-49, 50-64, and >64). For each group, the optimum number of clusters was selected combining a quantitative approach using the Silhouette coefficient, and a qualitative approach through a subgroup expert inspection via visual analytics. Using the features of the resultant age-gender clusters, we performed a meta-clustering analysis to provide an overall description of the population.
Results
We observed a total of 56 age-gender clusters, grouped in 11 clinically distinguishable meta-clusters with different outcomes. Meta-clusters 1 to 3 showed the highest recovery rates (90.27-95.22%). These clusters include: (1) healthy patients of all ages, (2) children with comorbidities who had priority in medical resources, (3) young patients with obesity and smoking habit. Meta-clusters 4 and 5 showed moderate recovery rates (81.3-82.81%): (4) patients with hypertension or diabetes of all ages, (5) typical obese patients with three highly correlated conditions, namely, pneumonia, hypertension and diabetes. Meta-clusters 6 to 11 had very low recovery rates (53.96-66.94%) which include: (6) immunosuppressed patients with the highest comorbidity rate in many diseases, (7) CKD patients with the worse survival length and recovery, (8) elderly smoker with mild COPD, (9) severe diabetic elderly with hypertension, (10, 11) oldest obese smokers with severe COPD and mild cardiovascular disease with the latter (11) showing a relatively higher age and smoke rate, severe COPD and shorter survival length, reinforcing a high correlation between smoking habit and COPD among elderly. Additionally, the source Mexican state and type of clinical institution proved to be an important factor for heterogeneity in severity.
Discussion
The proposed unsupervised learning approach successfully uncovered discriminative COVID-19 severity patterns for both genders and all age groups from clinical phenotypes and demographic features. A careful read of group outcomes showed consistent results regarding recent literature. Regarding the Mexican population, our results suggest that habits and comorbidities may play a key role in predicting mortality in older patients. Centenarians tended to fall in the groups with better outcomes repeatedly. Additionally, immunosuppression was not found as a relevant factor for severity alone but did when present along with chronic kidney disease. Further useful correlations could be found by evaluating the duration of unhealthy habits, demographic features, comorbidities, the time since diagnosis, recovery progress, readmission record, and the effect of source variability.
Conclusion
The resultant eleven meta-clusters provide bases to comprehend the classification of patients with COVID-19 based on comorbidities, habits, demographic characteristics, geographic data and type of clinical institutions, as well as revealing the correlations between the above characteristics thereby help to anticipate the possible clinical outcomes for every specifically characterized patient. These subphenotypes can establish target groups for automated stratification or triage systems to provide personalized therapies or treatments.
Code available at: https://github.com/bdslab-upv/covid19-metaclustering
Dynamic results visualization at: http://covid19sdetool.upv.es/?tab=mexicoGov