Background
Glioma, the most prevalent primary brain tumor, poses challenges in prognosis, particularly in the high-grade subclass, despite advanced treatments. The recent shift in tumor classification underscores the crucial role of isocitrate dehydrogenase (IDH) mutation status in the clinical care of glioma patients. However, conventional methods for determining IDH status, including biopsy, have limitations. Exploring the use of machine learning (ML) on magnetic resonance imaging (MRI) to predict IDH mutation status shows promise but encounters challenges in generalizability and translation into clinical practice because most studies either use single institution or homogeneous datasets for model training and validation. Our study aims to bridge this gap by using multi-institution data for model validation.
Methods
This retrospective study utilizes data from a large, annotated datasets for internal (377 cases from Yale New Haven Hospitals) and external validation (207 cases from facilities outside Yale New Haven Health). The six-step research process includes image acquisition, semi-automated tumor segmentation, feature extraction, model building with feature selection, internal validation, and external validation. An extreme gradient boosting ML model predicted the IDH mutation status, confirmed by immunohistochemistry.
Results
The ML model demonstrated high performance, with an Area under the Curve (AUC), Accuracy, Sensitivity, and Specificity in internal validation of 0.862, 0.865, 0.885, and 0.713, and external validation of 0.835, 0.851, 0.850, and 0.847.
Conclusion
The ML model, built on a heterogeneous dataset, provided robust results in external validation for the prediction task, emphasizing its potential clinical utility. Future research should explore expanding its applicability and validation in diverse global healthcare settings.