Amyotrophic lateral sclerosis (ALS) is a highly heterogeneous disease. Identifying subtypes of ALS to better predict prognosis with different methods is essential in clinical practice. Traditional cluster methods are limited by data and computation, and are hard to indicate heterogeneity and predict the prognosis of ALS. This study aims to develop a new classification system for ALS to predict prognosis adopting a data-driven approach. This was a single-center retrospective study based on an independent cohort of patients diagnosed with ALS recruited by Peking union medical college hospital (PUMCH) from 2014 to 2020. The mean follow-up time was 23.3±16.9 months. One thousand and two hundred fifty-four patients were recruited, and 303 variables were selected at baseline. We applied machine learning algorithms to train and test the performance of different hyperparameters. Fifty-nine baseline variables were included and 1166 patients (mean [SD] age, 53.3[10.7] years, 518 [55.5%] male patients; mean [SD] ALSFRS-R, 38.4 [6.3]), with <50 % missing data were identified and were used as input for the ML algorithm. K-Means Model classified four phenogroups (phenogroup 0= 245, phenogroup 1= 428, phenogroup 2=235, phenogroup 3=258) whose outcome are different from each other (hazard ratio, 1.12; 95% CI, 1.01-1.25; P < 0.001). In particular, there is a significant difference between phenogroup 0 and phenogroup 3 regarding decreased ALSFRS-R, swallowing function, decrease in BMI, and prognosis. Phenogroup 1 and phenogroup 2 have a moderate prognosis, but their characteristics are significantly different in terms of sex ratio, mean age, weight, and complications.
Data-driven automated approaches could develop a new classification system for ALS to precisely predict the prognosis
Trial Registration: Chinese Clinical Trial Registry, ChiCTR-IPR-15007385