). The issues that have dominated the NHST discussion in the psychological literature are that (1) NHST tempts the user into confusing the probability of the hypothesis given the data with the probability of the data given the hypothesis; (2) .05 is an arbitrary criterion for significance; and (3) in realworld applications, the null hypothesis is never exactly true, and will therefore always be rejected as the number of observations grows large.In the statistical literature, the pros and cons of NHST are also the topic of an ongoing dispute (e.g., Berger & Wolpert, 1988;O'Hagan & Forster, 2004;Royall, 1997;Sellke, Bayarri, & Berger, 2001;Stuart, Ord, & Arnold, 1999). 1 A comparison of these two literatures shows that in psychology, the NHST discussion has focused mostly on problems of interpretation, whereas in statistics, the NHST discussion has focused mostly on problems of formal construction. The statistical perspective on the problems associated with NHST is therefore fundamentally different from the psychological perspective. In this article, the goal is to explain NHST and its problems from a statistical perspective. Many psychologists are oblivious to certain statistical problems associated with NHST, and the examples below show that this ignorance can have important ramifications.In this article, I will show that an NHST p value depends on data that were never observed: The p value is a tail-area integral, and this integral is effectively over data that are not observed but only hypothesized. The probability of these hypothesized data depends crucially on the possibly unknown subjective intentions of the researcher who carried out the experiment. If these intentions were to be ignored, a user of NHST could always obtain a significant result through optional stopping (i.e., analyzing the data as they accumulate and stopping the experiment whenever the p value reaches some desired significance level). In the context of NHST, it is therefore necessary to know the subjective intention with which an experiment was carried out. This key requirement is unattainable in a practical sense, and arguably undesirable in a philosophical sense. In addition, I will review a proof that the NHST p value does not measure statistical evidence. In order for the p value to qualify as a measure of statistical evidence, a minimum requirement is that identical p values convey identical levels of evidence, irrespective 779Copyright 2007 Psychonomic Society, Inc. THEORETICAL AND REVIEW ARTICLESA practical solution to the pervasive problems of p values ERIC-JAN WAGENMAKERS University of Amsterdam, Amsterdam, The NetherlandsIn the field of psychology, the practice of p value null-hypothesis testing is as widespread as ever. Despite this popularity, or perhaps because of it, most psychologists are not aware of the statistical peculiarities of the p value procedure. In particular, p values are based on data that were never observed, and these hypothetical data are themselves influenced by subjective intentions. Moreover, p values d...
The Akaike information criterion (AIC;Akaike, 1973 (e.g., Akaike, 1978(e.g., Akaike, , 1979Bozdogan, 1987;Burnham & Anderson, 2002) The evaluation of competing hypotheses is central to the process of scientific inquiry. When the competing hypotheses are stated in the form of predictions from quantitative models, their adequacy with respect to observed data can be rigorously assessed. Given K plausible candidate models of the underlying process that has generated the observed data, we should like to know which hypothesis or model approximates the "true" process best. More generally, we should like to know how much statistical evidence the data provide for each of the K models, preferably in terms of likelihood (Royall, 1997) or the probability of each of the models' being correct (or the most correct, because the generating model may never be known for certain). The process of evaluating candidate models is termed model selection or model evaluation.A straightforward solution to the problem of evaluating several candidate models is to select the model that gives the most accurate description of the data. However, the process of model evaluation is complicated by the fact that a model with many free parameters is more flexible than a model with only a few parameters. It is clearly not desirable to always deem the most complex model the best, and it is generally accepted that the best model is the one that provides an adequate account of the data We thank Han van der Maas and In Jae Myung for helpful comments on an earlier draft of this paper. Correspondence concerning this article can be addressed to E.-J. Wagenmakers,
Bayesian hypothesis testing presents an attractive alternative to p value hypothesis testing. Part I of this series outlined several advantages of Bayesian hypothesis testing, including the ability to quantify evidence and the ability to monitor and update this evidence as data come in, without the need to know the intention with which the data were collected. Despite these and other practical advantages, Bayesian hypothesis tests are still reported relatively rarely. An important impediment to the widespread adoption of Bayesian tests is arguably the lack of user-friendly software for the run-of-the-mill statistical problems that confront psychologists for the analysis of almost every experiment: the t-test, ANOVA, correlation, regression, and contingency tables. In Part II of this series we introduce JASP (http://www.jasp-stats.org), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems. JASP is based in part on the Bayesian analyses implemented in Morey and Rouder’s BayesFactor package for R. Armed with JASP, the practical advantages of Bayesian hypothesis testing are only a mouse click away.
Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. In part I of this series we outline ten prominent advantages of the Bayesian approach. Many of these advantages translate to concrete opportunities for pragmatic researchers. For instance, Bayesian hypothesis testing allows researchers to quantify evidence and monitor its progression as data come in, without needing to know the intention with which the data were collected. We end by countering several objections to Bayesian hypothesis testing. Part II of this series discusses JASP, a free and open source software program that makes it easy to conduct Bayesian estimation and testing for a range of popular statistical scenarios (Wagenmakers et al. this issue).
The veracity of substantive research claims hinges on the way experimental data are collected and analyzed. In this article, we discuss an uncomfortable fact that threatens the core of psychology's academic enterprise: almost without exception, psychologists do not commit themselves to a method of data analysis before they see the actual data. It then becomes tempting to fine tune the analysis to the data in order to obtain a desired result-a procedure that invalidates the interpretation of the common statistical tests. The extent of the fine tuning varies widely across experiments and experimenters but is almost impossible for reviewers and readers to gauge. To remedy the situation, we propose that researchers preregister their studies and indicate in advance the analyses they intend to conduct. Only these analyses deserve the label "confirmatory," and only for these analyses are the common statistical tests valid. Other analyses can be carried out but these should be labeled "exploratory." We illustrate our proposal with a confirmatory replication attempt of a study on extrasensory perception.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.