Bayesian t tests have become an increasingly popular alternative to null-hypothesis significance testing (NHST) in psychological research. In contrast to NHST, they allow for the quantification of evidence in favor of the null hypothesis and for optional stopping. A major drawback of Bayesian t tests, however, is that error probabilities of statistical decisions remain uncontrolled. Previous approaches in the literature to remedy this problem either include time-consuming simulations or the specification of prior distributions without a substantive meaning. In this article, we propose a sequential probability ratio test that combines Bayesian t tests with simple decision criteria developed by Abraham Wald in 1947. We discuss this sequential procedure, which we call Waldian t test, in the context of default and informed Bayesian t tests. We show that Waldian t tests reliably control frequentist error probabilities, with the nominal Type I and Type II error probabilities serving as upper bounds to the actual error rates. At the same time, the prior distributions of Bayesian t tests are preserved. Thus, Waldian t tests are fully justified from both a frequentist and a Bayesian point of view. We highlight the relationship between frequentist and Bayesian error probabilities and critically discuss the implications of conventional stopping criteria for sequential Bayesian t tests. Finally, we provide a user-friendly web application which implements the proposed procedure for substantive researchers.
On the internet, people often collaborate to generate extensive knowledge bases such as Wikipedia for semantic information or OpenStreetMap for geographic information. When contributing to such online projects, individual judgments follow a sequential process in which one contributor creates an entry and other contributors have the possibility to modify, extend, and correct the entry by making incremental changes. We refer to this way of working together as sequential collaboration because it is characterized by dependent judgments that are based on the latest judgment available. Since the process of correcting each other in sequential collaboration has not yet been studied systematically, we compare the accuracy of sequential collaboration and wisdom of crowds, the aggregation of a set of independent judgments. In three experiments with groups of four or six individuals, accuracy for answering general knowledge questions increased within sequences of judgments in which participants had the possibility to correct the judgment of the previous contributor. Moreover, the final individual judgments in sequential collaboration were slightly more accurate than the averaged judgments in wisdom of crowds. This shows that collaboration can benefit from the dependency of individual judgments, thus explaining why large collaborative online projects often provide data of high quality.
Bayes factors allow researchers to test the effects of experimental manipulations in within-subjects designs using mixed-effects models. van Doorn et al. (2021) showed that such hypothesis tests can be performed by comparing different pairs of models which vary in the specification of the fixed- and random-effect structure for the within-subjects factor. To discuss the question of which of these model comparisons is most appropriate, van Doorn et al. used a case study to compare the corresponding Bayes factors. We argue that researchers should not only focus on pairwise comparisons of two nested models but rather use the Bayes factor for performing model selection among a larger set of mixed models that represent different auxiliary assumptions. In a standard one-factorial, repeated-measures design, the comparison should include four mixed-effects models: fixed-effects H0, fixed-effects H1, random-effects H0, and random-effects H1. Thereby, the Bayes factor enables testing both the average effect of condition and the heterogeneity of effect sizes across individuals. Bayesian model averaging provides an inclusion Bayes factor which quantifies the evidence for or against the presence of an effect of condition while taking model-selection uncertainty about the heterogeneity of individual effects into account. We present a simulation study showing that model selection among a larger set of mixed models performs well in recovering the true, data-generating model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.