Purpose
Large datasets are required to ensure reliable non-invasive glioma assessment with radiomics-based machine learning methods. This can often only be achieved by pooling images from different centers. Moreover, trained models should perform with high accuracy when applied to data from different centers. In this study, the impact of reconstruction settings and segmentation methods on radiomic features derived from amino acid and TSPO PET images of glioma patients was examined. Additionally, the ability to model and thus reduce feature differences was investigated.
Methods
[ 18 F]FET and [ 18 F]GE-180 PET data were acquired from 19 glioma patients. For each acquisition, 10 reconstruction settings and 9 segmentation methods were included to emulate multicentric data. Statistical robustness measures were calculated before and after ComBat harmonization. Differences between features due to setting variations were assessed using Friedman test, coefficient of variation (CV) and inter-rater reliability measures, including intraclass and Spearman’s rank correlation coefficients and Fleiss’ Kappa.
Results
According to Friedman analyses, most features (> 60%) showed significant differences. Yet, CV and inter-rater reliability measures indicated higher robustness. ComBat resulted in almost complete harmonization (> 87%) according to Friedman test and little to no improvement according to CV and inter-rater reliability measures. [ 18 F]GE-180 features displayed higher sensitivity to reconstruction settings than [ 18 F]FET features.
Conclusions
According to Friedman test, feature distributions could be successfully aligned using ComBat. However, depending on settings, changes in patient ranks were observed for some features and could not be eliminated by harmonization. Thus, for clinical utilization it is recommended to exclude affected features.