Background
Gliomas represent a biologically heterogeneous group of primary brain tumors with uncontrolled cellular proliferation and diffuse infiltration that renders them almost incurable, thereby leading to a grim prognosis. Recent comprehensive genomic profiling has greatly elucidated the molecular hallmarks of gliomas, including the mutations in isocitrate dehydrogenase 1 and 2 (IDH1 and IDH2), loss of chromosomes 1p and 19q (1p/19q), and epidermal growth factor receptor variant III (EGFRvIII). Detection of these molecular alterations is based on ex vivo analysis of surgically resected tissue specimen that sometimes is not adequate for testing and/or does not capture the spatial tumor heterogeneity of the neoplasm.
Methods
We developed a method for noninvasive detection of radiogenomic markers of IDH both in lower-grade gliomas (WHO grade II and III tumors) and glioblastoma (WHO grade IV), 1p/19q in IDH-mutant lower-grade gliomas, and EGFRvIII in glioblastoma. Preoperative MRIs of 473 glioma patients from 3 of the studies participating in the ReSPOND consortium (collection I: Hospital of the University of Pennsylvania [HUP: n = 248], collection II: The Cancer Imaging Archive [TCIA; n = 192], and collection III: Ohio Brain Tumor Study [OBTS, n = 33]) were collected. Neuro-Cancer Imaging Phenomics Toolkit (neuro-CaPTk), a modular platform available for cancer imaging analytics and machine learning, was leveraged to extract histogram, shape, anatomical, and texture features from delineated tumor subregions and to integrate these features using support vector machine to generate models predictive of IDH, 1p/19q, and EGFRvIII. The models were validated using 3 configurations: (1) 70–30% training–testing splits or 10-fold cross-validation within individual collections, (2) 70–30% training–testing splits within merged collections, and (3) training on one collection and testing on another.
Results
These models achieved a classification accuracy of 86.74% (HUP), 85.45% (TCIA), and 75.15% (TCIA) in identifying EGFRvIII, IDH, and 1p/19q, respectively, in configuration I. The model, when applied on combined data in configuration II, yielded a classification success rate of 82.50% in predicting IDH mutation (HUP + TCIA + OBTS). The model when trained on TCIA dataset yielded classification accuracy of 84.88% in predicting IDH in HUP dataset.
Conclusions
Using machine learning algorithms, high accuracy was achieved in the prediction of IDH, 1p/19q, and EGFRvIII mutation. Neuro-CaPTk encompasses all the pipelines required to replicate these analyses in multi-institutional settings and could also be used for other radio(geno)mic analyses.