In theory, a comparison of two experimental effects requires a statistical test on their difference. In practice, this comparison is often based on an incorrect procedure involving two separate tests in which researchers conclude that effects differ when one effect is significant (P < 0.05) but the other is not (P > 0.05). We reviewed 513 behavioral, systems and cognitive neuroscience articles in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience) and found that 78 used the correct procedure and 79 used the incorrect procedure. An additional analysis suggests that incorrect analyses of interactions are even more common in cellular and molecular neuroscience. We discuss scenarios in which the erroneous procedure is particularly beguiling."The percentage of neurons showing cue-related activity increased with training in the mutant mice (P < 0.05), but not in the control mice (P > 0.05)." "Animals receiving vehicle (control) infusions into the amygdala showed increased freezing to the conditioned stimulus compared with a control stimulus (P < 0.01); in animals receiving muscimol infusions into the amygdala, this difference was abolished (F < 1)."These two fictive, but representative, statements illustrate a statistical error that is common in the neuroscience literature. The researchers who made these statements wanted to claim that one effect (for example, the training effect on neuronal activity in mutant mice) was larger or smaller than the other effect (the training effect in control mice). To support this claim, they needed to report a statistically significant interaction (between amount of training and type of mice), but instead they reported that one effect was statistically significant, whereas the other effect was not. Although superficially compelling, the latter type of statistical reasoning is erroneous because the difference between significant and not significant need not itself be statistically significant 1 . Consider an extreme scenario in which traininginduced activity barely reaches significance in mutant mice (for example, P = 0.049) and barely fails to reach significance for control mice (for example, P = 0.051). Despite the fact that these two P values lie on opposite sides of 0.05, one cannot conclude that the training effect for mutant mice differs statistically from that for control mice.That is, as famously noted by Rosnow and Rosenthal 2 , "surely, God loves the 0.06 nearly as much as the 0.05". Thus, when making a comparison between two effects, researchers should report the statistical significance of their difference rather than the difference between their significance levels.Our impression was that this error of comparing significance levels is widespread in the neuroscience literature, but until now there were no aggregate data to support this impression. We therefore examined all of the behavioral, systems and cognitive neuroscience studies published in four prestigious journals (Nature, Science, Nature Neuroscience and Neuron) ...