Recently, Eklund et al. (1) analyzed clustering methods in standard fMRI packages: AFNI (which we maintain), FSL, and SPM. They claim that (i) false-positive rates (FPRs) in traditional approaches are greatly inflated, questioning the validity of "countless published fMRI studies"; (ii) nonparametric methods produce valid, but slightly conservative, FPRs; (iii) a common flawed assumption is that the spatial autocorrelation function (ACF) of fMRI noise is Gaussian-shaped; and (iv) a 15-yold bug in AFNI's 3dClustSim significantly contributed to producing "particularly high" FPRs compared with other software. We repeated simulations from ref. AFNI and 3dClustSim
SmoothnessTo test the effect of assuming a Gaussian ACF in fMRI noise, an empirical "mixed ACF" allowing for longer tails was computed from residuals (3). All FPRs (Fig. 1 E and F) decreased. Block designs remained >5%, likely reflecting dependence of the noise's spatial smoothness on temporal frequency. Heavy tails in spatial smoothness indeed have significant consequences for clustering.
Nonparametric ApproachA spatial model-free, nonparametric randomization approach was added to AFNI's group-level GLM program, 3dttest++ (3). All FPRs (Fig. 1 G and H) were within the nominal confidence interval. Although this approach shows promise (as in ref . 1), it may not be feasible to generalize nonparametric permutations to complicated covariate structures and models (e.g., complex ANOVA, analysis of covariance, or linear mixed effects) (4, 5).
Inflated FPRsSeveral cases showed significant FPR inflation across existing fMRI software within the testing framework of ref. 1. However, deviations from nominal FPR were not uniformly large and depended strongly on several factors. Fig. 1 and figure 1 of ref. 1 show quite good cluster results for stricter per-voxel P values (which ref. 6 found to be predominantly used in fMRI analyses) and for event-related stimuli (emphasizing the importance of good experimental design): FPR inflation was often K 10% (Beijing) or K 5% (Cambridge), affecting only clusters with marginally significant volume.We strongly disagree with Eklund et al.'s (1) summary statement: "Alarmingly, the parametric methods can give a very high degree of false positives (up to 70%, compared with the nominal 5%)." For comparison, their own nonparametric method's results actually showed up to 40% FPR. When characterizing results, medians or percentile ranges are generally more informative summary statistics than maxima. Looking backward, the typical ranges show much smaller FPR inflation than what had been highlighted, and looking forward they provide useful suggestions for experimental design and analyses (lower voxelwise P, event-related paradigms, etc.). By concentrating on the highest observed FPRs, the conclusions of Eklund et al. (1) are unnecessarily alarmist.