2017
DOI: 10.7717/peerj.3544
|View full text |Cite
|
Sign up to set email alerts
|

The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

Abstract: The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values te… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
225
0
1

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 266 publications
(229 citation statements)
references
References 216 publications
(435 reference statements)
3
225
0
1
Order By: Relevance
“…The paired Student t tests and the correlation analyses were made not only with data derived from this study, but also recurring to the original data sets derived from the same fish, as provided by the authors (Poleksić et al, ). As a measure of graded evidence (Amrhein, Korner‐Nievergelt, & Roth, ), the significance level (α) was set at 5%.…”
Section: Methodsmentioning
confidence: 99%
“…The paired Student t tests and the correlation analyses were made not only with data derived from this study, but also recurring to the original data sets derived from the same fish, as provided by the authors (Poleksić et al, ). As a measure of graded evidence (Amrhein, Korner‐Nievergelt, & Roth, ), the significance level (α) was set at 5%.…”
Section: Methodsmentioning
confidence: 99%
“…Interestingly, there is a Bayesian line of argumentation to justify the LOD score threshold of 3, showing that under a number of assumptions, this threshold yields a false-positive frequency of about 5% (see also Khoury, Beaty, & Cohen, 1993). Another recommendation is not to rely on thresholds as such but to use and interpret p values differently as "graded measures of the strength of evidence against H 0 " KÖNIG | 243 (Fisher, 1956) or as "measure of surprise: the smaller it is, the more surprising the results are if H 0 is true" (Amrhein, Korner-Nievergelt, & Roth, 2017). Another recommendation is not to rely on thresholds as such but to use and interpret p values differently as "graded measures of the strength of evidence against H 0 " KÖNIG | 243 (Fisher, 1956) or as "measure of surprise: the smaller it is, the more surprising the results are if H 0 is true" (Amrhein, Korner-Nievergelt, & Roth, 2017).…”
Section: The Reproducibility Crisis: P Values and Significance Thrementioning
confidence: 99%
“…Unlike the Neyman-Pearson approach, Fisherian p values are interpreted on a sliding scale of probability in which the smaller the p value, the less likely it is that the observed data would occur if the null hypothesis was true (Amrhein, Korner-Nievergelt, & Roth, 2017;Biau, Jolles, & Porcher, 2010;Bradley & Brand, 2016;Falissard, 2012;Haig, 2016;Hubbard & Bayarri, 2003;Wasserstein & Lazar, 2016). Hence, from a Fisherian perspective, p = .0004 indicates stronger evidence against the null hypothesis than p = .04 (Hubbard & Lindsay, 2008).…”
Section: Abandon the Neyman-pearson Approachmentioning
confidence: 99%
“…The typical method of conducting null hypothesis significance tests often represents a hybrid of the Neyman-Pearson and Fisherian approaches (Amrhein et al, 2017;Biau et al, 2010;Bradley & Brand, 2016;Hubbard & Bayarri, 2003;Schneider, 2015). I discuss the Fisherian approach later on in this article.…”
Section: Endnotesmentioning
confidence: 99%
See 1 more Smart Citation