A tradition that goes back to Karl R. Popper assesses the value of a statistical test primarily by its severity: was it a honest and stringent attempt to prove the theory wrong? For "error statisticians" such as Deborah Mayo (1996, 2018), and frequentists more generally, severity is a key virtue in hypothesis tests. Conversely, failure to incorporate severity into statistical inference, as it allegedly happens in Bayesian inference, counts as a major methodological shortcoming. Our paper pursues a double goal: First, we argue that the error-statistical explication of severity has substantive drawbacks (i.e., neglect of research context; lack of connection to specificity of predictions; problematic similarity of degrees of severity to one-sided p-values). Second, we argue that severity matters for Bayesian inference via the value of specific, risky predictions: severity boosts the expected evidential value of a Bayesian hypothesis test. We illustrate severity-based reasoning in Bayesian statistics by means of a practical example and discuss its advantages and potential drawbacks.
Theories are among the most important tools of science. Lewin (1943) already noted “[t]here is nothing as practical as a good theory”. Although psychologists discussed problems of theory in their discipline for a long time, weak theories are still widespread in most subfields. One possible reason for this is that psychologists lack the tools to systematically assess the quality of their theories. Thagard (1989) developed a computational model for formal theory evaluation based on the concept of explanatory coherence. However, there are possible improvements to Thagard’s (1989) model and it is not available in software that psychologists typically use. Therefore, we developed a new implementation of explanatory coherence based on the Ising model. We demonstrate the capabilities of this new Ising Model of Explanatory Coherence (IMEC) on several examples from psychology and other sciences. It is also available in the R-package IMEC so that it can help scientists to evaluate the quality of their theories in practice.
The content of this dissertation spans four years of work, which was carried out in the Netherlands (Tilburg University and University of Amsterdam) and Italy (University of Turin). It is part of the ERC project “Making Scientific Inference More Objective” led by professor Jan Sprenger, for which philosophy of science and empirical research were combined. The dissertation can be summarized as a small set of modest attempts to contribute to improving scientific practice. Each of these attempts was geared towards either increasing understanding of a particular problem or making a contribution to how science can be practiced. The general focus was on philosophical nuance while remaining methodologically practicable. The five papers contained in this dissertation are both methodologically and philosophically diverse. The first three (Chapters 2 through 4) are more empirical in nature and are focused on understanding and evaluating how science is practiced: a meta-analysis of semantic intuitions research in experimental philosophy; a systematic review on essay literature on the null hypothesis significance test; and an experiment on how teams of statisticians analyze the same data. The last two (Chapters 5 and 6) are focused on the improvement of scientific practice by providing tools for the improvement of empirical research with a strong philosophical foundation: a practicable and testable definition of scientific objectivity and a Bayesian operationalization of Popper’s concept of a severe test.
For decades, waxing and waning, there has been an ongoing debate on the values and problems of the ubiquitously used null hypothesis significance test (NHST). With the start of the replication crisis, this debate has flared-up once again, especially in the psychology and psychological methods literature. Arguing for or against the NHST method usually takes place in essay and opinion pieces that cover some, but not all the qualities and problems of the method. The NHST literature landscape is vast, a clear overview is lacking, and participants in the debate seem to be talking past one another. To contribute to a resolution, we conducted a systematic review on essay literature concerning NHST published in psychology and psychological methods journals between 2011 and 2018. We extracted all arguments in defense of (20) and against (70) NHST, and we extracted the solutions (33) that were proposed to remedy (some of) the perceived problems of NHST. Unfiltered, these 123 items form a landscape that is prohibitively difficult to keep in one’s sights. Our contribution to the resolution of the NHST debate is twofold. 1) We performed a thematic synthesis of the arguments and solutions that carves the landscape in a framework of three zones: mild, moderate, and critical. This reduction summarizes groups of arguments and solutions, thus offering a manageable overview of NHST’s qualities, problems, and solutions. 2) We provide the data on the arguments and solutions as a resource for those who will carry-on the debate and/or study the use of NHST.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.