Background: Low back pain (LBP) is a heterogeneous disease with biological, physical, and psychosocial etiologies. Models for predicting LBP severity and chronicity have not made a clinical impact, perhaps due to difficulty deciphering multidimensional phenotypes. In this study, our objective was to develop a computational framework to comprehensively screen metrics related to LBP severity and chronicity and identify the most influential. Methods: We identified individuals from the observational, longitudinal Osteoarthritis Initiative cohort (N = 4796) who reported LBP at enrollment (N = 215). OAI descriptor variables (N = 1190) were used to cluster individuals via unsupervised learning and uncover latent LBP phenotypes. We also developed a dimensionality reduction algorithm to visualize clusters/phenotypes using Uniform Manifold Approximation and Projection (UMAP). Next, to predict chronicity, we identified those with acute LBP (N = 40) and persistent LBP over 8 years of follow-up (N = 66) and built logistic regression and supervised machine learning models.Results: We identified three LBP phenotypes: a "high socioeconomic status, low pain severity group", a "low socioeconomic status, high pain severity group", and an intermediate group. Mental health and nutrition were also key clustering variables, while traditional biomedical factors (e.g., age, sex, BMI) were not. Those who developed chronic LBP were differentiated by higher pain interference and lower alcohol consumption (a correlate to poor physical fitness and lower soceioeconomic status). All models for predicting chronicity had satisfactory performance (accuracy 76%-78%).
Conclusions:We developed a computational pipeline capable of screening hundreds of variables and visualizing LBP cohorts. We found that socioeconomic status, mental health, nutrition, and pain interference were more influential in LBP than traditional biomedical descriptors like age, sex, and BMI.