Background:
To develop machine-learning based models to predict the progression-free survival (PFS) and overall survival (OS) in patients with gliomas and explore the effect of different feature selection methods on the prediction.
Methods:
We included 505 patients (training cohort, n = 354; validation cohort, n = 151) with gliomas between January 1, 2011 and December 31, 2016. The clinical, neuroimaging, and molecular genetic data of patients were retrospectively collected. The multi-causes discovering with structure learning (McDSL) algorithm, least absolute shrinkage and selection operator regression (LASSO), and Cox proportional hazards regression model were employed to discover the predictors for 3-year PFS and OS, respectively. Eight machine learning classifiers with 5-fold cross-validation were developed to predict 3-year PFS and OS. The area under the curve (AUC) was used to evaluate the prognostic performance of classifiers.
Results:
McDSL identified four causal factors (tumor location, WHO grade, histologic type, and molecular genetic group) for 3-year PFS and OS, whereas LASSO and Cox identified wide-range number of factors associated with 3-year PFS and OS. The performance of each machine learning classifier based on McDSL, LASSO, and Cox was not significantly different. Logistic regression yielded the optimal performance in predicting 3-year PFS based on the McDSL (AUC, 0.872, 95% confidence interval [CI]: 0.828-0.916) and 3-year OS based on the LASSO (AUC, 0.901, 95% CI: 0.861-0.940).
Conclusions:
McDSL is more reproducible than LASSO and Cox model in the feature selection process. Logistic regression model may have the highest performance in predicting 3-year PFS and OS of gliomas.