W e investigated the way experienced users interpret Null Hypothesis Significance Testing (NHST) outcomes. An empirical study was designed to compare the reactions of two populations of NHST users, psychological researchers and professional applied statisticians, when faced with contradictory situations. The subjects were presented with the results of an experiment designed to test the efficacy of a drug by comparing two groups (treatment/placebo). Four situations were constructed by combining the outcome of the t test (significant vs. nonsignificant) and the observed difference between the two means D (large vs. small). Two of these situations appeared as conflicting (t significant/D small and t nonsignificant/D large). Three fundamental aspects of statistical inference were investigated by means of open questions: drawing inductive conclusions about the magnitude of the true difference from the data in hand, making predictions for future data, and making decisions about stopping the experiment. The subjects were 25 statisticians from pharmaceutical companies in France, subjects well versed in statistics, and 20 psychological researchers from various laboratories in France, all with experience in processing and analyzing experimental data. On the whole, statisticians and psychologists reacted in a similar way and were very impressed by significant results. It must be outlined that professional applied statisticians were not immune to misinterpretations, especially in the case of nonsignificance. However, the interpretations that accustomed users attach to the outcome of NHST can vary from one individual to another, and it is hard to conceive that there could be a consensus in the face of seemingly conflicting situations. In fact, beyond the superficial report of "erroneous" interpretations, it can be seen in the misuses of NHST intuitive judgmental "adjustments" that try to overcome its inherent shortcomings. These findings encourage the many recent attempts to improve the habitual ways of analyzing and reporting experimental data. N ous avons étudié la manière dont des utilisateurs expérimentés interprètent les résultats des Tests de Signification de l'Hypothèse Nulle. Une étude empirique a été menée pour comparer les réactions de deux populations d'utilisateurs, des chercheurs en psychologie et des statisticiens professionnels, face à des situations conflictuelles. On présentait aux sujets les résultats d'une expérience planifiée pour tester l'efficacité d'un médicament en comparant deux groupes (traitement/ placebo). Quatre situations étaient construites en combinant l'issue du test t (significatif vs. non-significatif) et la différence observée D entre les deux moyennes (grande vs. petite). Deux de ces situations apparaissaient conflictuelles (t significatif/ D petite et t non-significatif/D grande). Trois aspects fondamentaux de l'inférence statistique étaient examinés au moyen de questions ouvertes: tirer une conclusion inductive sur la grandeur de la vraie différence, faire une prédiction relative à des d...
P. R. Killeen's (2005a) probability of replication (prep) of an experimental result is the fiducial Bayesian predictive probability of finding a same-sign effect in a replication of an experiment. prep is now routinely reported in Psychological Science and has also begun to appear in other journals. However, there is little concrete, practical guidance for use of prep, and the procedure has not received the scrutiny that it deserves. Furthermore, only a solution that assumes a known variance has been implemented. A practical problem with prep is identified: In many articles, prep appears to be incorrectly computed, due to the confusion between 1-tailed and 2-tailed p values. Experimental findings reveal the risk of misinterpreting prep as the predictive probability of finding a same-sign and significant effect in a replication (p srep). Conceptual and practical guidelines are given to avoid these pitfalls. They include an extension to the case of unknown variance. Moreover, other uses of fiducial Bayesian predictive probabilities for analyzing, designing ("how many subjects?"), and monitoring ("when to stop?") experiments are presented. Concluding remarks emphasize the role of predictive procedures in statistical methodology.
The current context of the "significance test controversy" is first briefly discussed. Then experimental studies about the use of null hypothesis significance tests by scientific researchers and applied statisticians are presented. The misuses of these tests are reconsidered as judgmental adjustments revealing researchers' requirements towards statistical inference. Lastly alternative methods are considered. Consequently we automatically ask ourselves "won't the Bayesian choice be unavoidable?"
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.