Background
The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children.
Methods
We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of 1) a LR model that assumed linearity and additivity (simple LR model) 2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) 3) a support vector machine, 4) a random forest and 5) boosted classification trees for predicting surgical morbidity.
Results
The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination.
Conclusion
Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks.