Background
Although variation in long-term course of major depressive disorder (MDD) is not strongly predicted by existing symptom subtype distinctions, recent research suggests that prediction can be improved by using machine learning methods. However, it is not known whether these distinctions can be refined by added information about comorbid conditions. The current report presents results on this question.
Methods
Data come from 8,261 respondents with lifetime DSM-IV MDD in the WHO World Mental Health (WMH) Surveys. Outcomes include four retrospectively-reported measures of persistence-severity of course (years in episode; years in chronic episodes, hospitalization for MDD; disability due to MDD). Machine learning methods (regression tree analysis; lasso, ridge, and elastic net penalized regression) followed by k-means cluster analysis were used to augment previously-detected subtypes with information about prior comorbidity to predict these outcomes.
Results
Predicted values were strongly correlated across outcomes. Cluster analysis of predicted values found 3 clusters with consistently high, intermediate, or low values. The high-risk cluster (32.4% of cases) accounted for 56.6–72.9% of high persistence, high chronicity, hospitalization, and disability. This high-risk cluster had both higher sensitivity and likelihood-ratio positive (relative proportions of cases in the high-risk cluster versus other clusters having the adverse outcomes) than in a parallel analysis that excluded measures of comorbidity as predictors.
Conclusions
Although results using the retrospective data reported here suggest that useful MDD subtyping distinctions can be made with machine learning and clustering across multiple indicators of illness persistence-severity, replication is need with prospective data to confirm this preliminary conclusion.