Purpose: Despite its increasing application, radiomics has not yet demonstrated a solid reliability, due to the difficulty in replicating analyses. The extraction of radiomic features from clinical MRI (T1w/T2w) presents even more challenges because of the absence of well-defined units (e.g. HU). Some preprocessing steps are required before the estimation of radiomic features and one of this is the intensity normalization, that can be performed using different methods. The aim of this work was to evaluate the effect of three different normalization techniques, applied on T2w-MRI images of the pelvic region, on radiomic features reproducibility. Methods: T2w-MRI acquired before (MRI1) and 12 months after radiotherapy (MRI2) from 14 patients treated for prostate cancer were considered. Four different conditions were analyzed: (a) the original MRI (No_Norm); (b) MRI normalized by the mean image value (Norm_Mean); (c) MRI normalized by the mean value of the urine in the bladder (Norm_ROI); (d) MRI normalized by the histogram-matching method (Norm_HM). Ninety-one radiomic features were extracted from three organs of interest (prostate, internal obturator muscles and bulb) at both time-points and on each image discretized using a fixed bin-width approach and the difference between the two time-points was calculated (Dfeature). To estimate the effect of normalization methods on the reproducibility of radiomic features, ICC was calculated in three analyses: (a) considering the features extracted on MRI2 in the four conditions together and considering the influence of each method separately, with respect to No_Norm; (b) considering the features extracted on MRI2 in the four conditions with respect to the inter-observer variability in region of interest (ROI) contouring, considering also the effect of the discretization approach; (c) considering Dfeature to evaluate if some indices can recover some consistency when differences are calculated. Results: Nearly 60% of the features have shown poor reproducibility (ICC < 0.5) on MRI2 and the method that most affected features reliability was Norm_ROI (average ICC of 0.45). The other two methods were similar, except for first-order features, where Norm_HM outperformed Norm_Mean (average ICC = 0.33 and 0.76 for Norm_Mean and Norm_HM, respectively). In the inter-observer setting, the number of reproducible features varied in the three structures, being higher in the prostate than in the penile bulb and in the obturators. The analysis on Dfeature highlighted that more than 60% of the features were not consistent with respect to the normalization method and confirmed the high reproducibility of the features between Norm_Mean and Norm_HM, whereas Norm_ROI was the less reproducible method.
Conclusions:The normalization process impacts the reproducibility of radiomic features, both in terms of changes in the image information content and in the inter-observer setting. Among the considered methods, Norm_Mean and Norm_HM seem to provide the most reproducible features with respect to the ...