How to Be a Statistical DetectiveStatistical errors are surprisingly common in the biomedical literature and contribute to the reproducibility crisis. 1 Some errors are impossible to spot without access to the underlying dataset, but many are detectable from information available in the paper. You don't need a degree in statistics to catch most of these errors. Common sense and simple arithmetic are often all that's required. In addition, there are an increasing number of free, easy-to-use online statistical sleuthing tools that facilitate error detection. This article reviews some simple techniques and tools you can use to catch errors when reviewing others' papers or when double-checking your own work.
Use Common SenseIn some cases, reported statistics and results don't make sense at face value. Thus, one of the best tools in the statistical detective's toolkit is common sense. For example, I was once reviewing a meta-analysis of a nonsurgical treatment for knee pain that reported a pooled effect size of a 4.04-standard deviation (SD) reduction in pain in treated patients (95% confidence interval: 2.81 to 5.26). This is an implausibly large effect size. Note that a 0.8 SD effect size is commonly used as a benchmark for a "large" effect. Figure 1 illustrates what the pre-and posttreatment distributions of pain scores would have to look like to achieve a 4.04-SD reduction in pain. Most patients would have to have very high pain scores before treatment and very low pain scores after treatment. (Alternatively, the pretreatment variability in pain scores would have to be implausibly low.) It's unlikely that any treatment for knee pain can achieve such a consistent and dramatic reduction in pain. A little further statistical detective work revealed the error in this case: When performing the meta-analysis, the authors had erroneously plugged in standard errors rather than SDs from the original papers.
Do Simple ArithmeticMany important statistical errors can be revealed through simple arithmetic. When numbers within or across tables and figures don't add up, this may reveal larger problems with data management or analysis. For example, Figure 2 shows a table from a published paper that contains numerous statistical and numerical inconsistencies. 2 This paper was brought to my attention because the second author has had 17 other papers retracted to date. 3 The paper explored the effects of labeling physical activity as "fun" vs "exercise" on participants' subsequent food consumption.A quick scan of the table (Figure 2) reveals simple numerical problems that should have been caught during peer review. For example, though each group had 28 participants, a sample size of n = 29 is listed for the exercise framing group in the "Drink chosen" column. The explanation reveals an error: n = 29 refers to the number of drinks rather than the number of people; and the authors calculated the mean calories per drink rather than the quantity of interest: the mean drink calories per person. (The calories per drink from four cokes of ...