The secret lives of experiments: Methods reporting in the fMRI literature

Carp, Joshua

doi:10.1016/j.neuroimage.2012.07.004

Cited by 457 publications

(392 citation statements)

References 46 publications

Supporting

Mentioning

382

Contrasting

Unclassified

Order By: Relevance

“…In this paper the focus has been on cluster level inference [8], as it is more commonly used than voxel level inference. Two common cluster defining thresholds [9,10] were tested; p = 0.01 (z = 2.3) and p = 0.001 (z = 3.1). The parameters used for each software package are given in Table 3.…”

Section: Methodsmentioning

confidence: 99%

Empirically investigating the statistical validity of SPM, FSL and AFNI for single subject fMRI analysis

Eklund

Nichols

Andersson

et al. 2015

2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI)

View full text Add to dashboard Cite

The software packages SPM, FSL and AFNI are the most widely used packages for the analysis of functional magnetic resonance imaging (fMRI) data. Despite this fact, the validity of the statistical methods has only been tested using simulated data. By analyzing resting state fMRI data (which should not contain specific forms of brain activity) from 396 healthy controls, we here show that all three software packages give inflated false positive rates (4%-96% compared to the expected 5%). We isolate the sources of these problems and find that SPM mainly suffers from a too simple noise model, while FSL underestimates the spatial smoothness. These results highlight the need of validating the statistical methods being used for fMRI.

show abstract

Section: Methodsmentioning

confidence: 99%

Empirically investigating the statistical validity of SPM, FSL and AFNI for single subject fMRI analysis

Eklund

Nichols

Andersson

et al. 2015

2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI)

View full text Add to dashboard Cite

show abstract

“…1 show quite good cluster results for stricter per-voxel P values (which ref. 6 found to be predominantly used in fMRI analyses) and for event-related stimuli (emphasizing the importance of good experimental design): FPR inflation was often K 10% (Beijing) or K 5% (Cambridge), affecting only clusters with marginally significant volume.…”

Section: Inflated Fprsmentioning

confidence: 99%

fMRI clustering and false-positive rates

Cox

Chen

Glen

et al. 2017

Proc. Natl. Acad. Sci. U.S.A.

204

150

View full text Add to dashboard Cite

Recently, Eklund et al. (1) analyzed clustering methods in standard fMRI packages: AFNI (which we maintain), FSL, and SPM. They claim that (i) false-positive rates (FPRs) in traditional approaches are greatly inflated, questioning the validity of "countless published fMRI studies"; (ii) nonparametric methods produce valid, but slightly conservative, FPRs; (iii) a common flawed assumption is that the spatial autocorrelation function (ACF) of fMRI noise is Gaussian-shaped; and (iv) a 15-yold bug in AFNI's 3dClustSim significantly contributed to producing "particularly high" FPRs compared with other software. We repeated simulations from ref. AFNI and 3dClustSim SmoothnessTo test the effect of assuming a Gaussian ACF in fMRI noise, an empirical "mixed ACF" allowing for longer tails was computed from residuals (3). All FPRs (Fig. 1 E and F) decreased. Block designs remained >5%, likely reflecting dependence of the noise's spatial smoothness on temporal frequency. Heavy tails in spatial smoothness indeed have significant consequences for clustering. Nonparametric ApproachA spatial model-free, nonparametric randomization approach was added to AFNI's group-level GLM program, 3dttest++ (3). All FPRs (Fig. 1 G and H) were within the nominal confidence interval. Although this approach shows promise (as in ref . 1), it may not be feasible to generalize nonparametric permutations to complicated covariate structures and models (e.g., complex ANOVA, analysis of covariance, or linear mixed effects) (4, 5). Inflated FPRsSeveral cases showed significant FPR inflation across existing fMRI software within the testing framework of ref. 1. However, deviations from nominal FPR were not uniformly large and depended strongly on several factors. Fig. 1 and figure 1 of ref. 1 show quite good cluster results for stricter per-voxel P values (which ref. 6 found to be predominantly used in fMRI analyses) and for event-related stimuli (emphasizing the importance of good experimental design): FPR inflation was often K 10% (Beijing) or K 5% (Cambridge), affecting only clusters with marginally significant volume.We strongly disagree with Eklund et al.'s (1) summary statement: "Alarmingly, the parametric methods can give a very high degree of false positives (up to 70%, compared with the nominal 5%)." For comparison, their own nonparametric method's results actually showed up to 40% FPR. When characterizing results, medians or percentile ranges are generally more informative summary statistics than maxima. Looking backward, the typical ranges show much smaller FPR inflation than what had been highlighted, and looking forward they provide useful suggestions for experimental design and analyses (lower voxelwise P, event-related paradigms, etc.). By concentrating on the highest observed FPRs, the conclusions of Eklund et al. (1) are unnecessarily alarmist.

show abstract

“…Researchers can influence their data during undocumented analysis and pre-processing steps and by the mere choice of structuring the data (constituting researcher degrees of freedom; Simmons et al, 2011). This is particularly a problem in neuroimaging where the complexity and idiosyncrasy of analyses is such that it is usually impossible to replicate exactly what happened and why during data analysis (Kriegeskorte et al, 2009;Vul et al, 2009;Carp, 2012). Another term that has been used to describe the impact of diverse analytical choices is "vibration of effects" (Ioannidis, 2008).…”

Section: Nhst May Foster Selective Reporting and Subjectivitymentioning

confidence: 99%

“…This casts doubts on a substantial part of the published fMRI literature. Further, Carp (2012) reported that about 40% of 241 relatively recent fMRI papers actually did not report having used multiple testing correction. So, a very high percentage of fMRI literature may have been exposed to high false positive rates either multiple correction was used or not (see also (Szucs and Ioannidis, 2017) on statistical power).…”

Section: Family-wise Error Rate (Fwer) and Fdr Correction In Nhstmentioning

confidence: 99%

“…Hence, competitors will not be able to scoop good ideas before the study is published. Considering the extreme analysis flexibility offered by high-dimensional neuroscience data (Kriegeskorte et al, 2009;Vul et al, 2009;Carp, 2012) pre-registration seems a necessary pre-condition of robust hypothesis driven neuroscience research. Pre-registration would likely help to cleanse non-replicable "unchallenged fallacies" (Ioannidis, 2012) from the literature.…”

Section: Pre-registrationmentioning

confidence: 99%

See 1 more Smart Citation

When null hypothesis significance testing is unsuitable for research: a reassessment

Szűcs

Ioannidis

2016

Preprint

107

View full text Add to dashboard Cite

Null hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of (cognitive) neuroscience, psychology, and biomedical science in general. We review these shortcomings and suggest that, after sustained negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak we should not rely on all or nothing hypothesis tests. Different inferential methods may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out.

show abstract

The secret lives of experiments: Methods reporting in the fMRI literature

Cited by 457 publications

References 46 publications

Empirically investigating the statistical validity of SPM, FSL and AFNI for single subject fMRI analysis

Empirically investigating the statistical validity of SPM, FSL and AFNI for single subject fMRI analysis

fMRI clustering and false-positive rates

When null hypothesis significance testing is unsuitable for research: a reassessment

Contact Info

Product

Resources

About