BACKGROUND AND PURPOSE
Corpus callosum atrophy is a neurodegenerative biomarker in multiple sclerosis (MS). Manual delineations are gold standard but subjective and labor intensive. Novel automated methods are promising but require validation. We aimed to compare the robustness of manual versus automatic corpus callosum segmentations based on FreeSurfer.
METHODS
Nine MS patients (6 females, age 38 ± 13 years, disease duration 7.3 ± 5.2 years) were scanned twice with repositioning using 3‐dimensional T1‐weighted magnetic resonance imaging on three scanners (two 1.5 T and one 3.0 T), that is, six scans/patient, on the same day. Normalized corpus callosum areas were measured independently by a junior doctor and neuroradiologist. The cross‐sectional and longitudinal streams of FreeSurfer were used to segment the corpus callosum volume.
RESULTS
Manual measurements had high intrarater (junior doctor .96 and neuroradiologist .96) and interrater agreement (.94), by intraclass correlation coefficient (P < .001). The coefficient of variation was lowest for longitudinal FreeSurfer (.96% within scanners; 2.0% between scanners) compared to cross‐sectional FreeSurfer (3.7%, P = .001; 3.8%, P = .058) and the neuroradiologist (2.3%, P = .005; 2.4%, P = .33). Longitudinal FreeSurfer was also more accurate than cross‐sectional (Dice scores 83.9 ± 7.5% vs. 78.9 ± 8.4%, P < .01 relative to manual segmentations). The corpus callosum measures correlated with physical disability (longitudinal FreeSurfer r = –.36, P < .01; neuroradiologist r = –.32, P < .01) and cognitive disability (longitudinal FreeSurfer r = .68, P < .001; neuroradiologist r = .64, P < .001).
CONCLUSIONS
FreeSurfer's longitudinal stream provides corpus callosum measures with better repeatability than current manual methods and with similar clinical correlations. However, due to some limitations in accuracy, caution is warranted when using FreeSurfer with clinical data.