We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries. T he lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on 'statistically significant' findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (for example, multiple testing, P-hacking, publication bias and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating statistically significant findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems.For fields where the threshold for defining statistical significance for new discoveries is P < 0.05, we propose a change to P < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called significant but do not meet the new threshold should instead be called suggestive. While statisticians have known the relative weakness of using P ≈ 0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new 1,2 , a critical mass of researchers now endorse this change.We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (for example, genomics and high-energy physics research; see the 'Potential objections' section below).We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P values. However, changing the P value threshold is simple, aligns with the training undertaken by many researchers, and might quickly achieve broad acceptance.
Two connectionist frameworks, GRAIN (J. L. McClelland, 1993) and brain-state-in-a-box (J. A. Anderson, 1991), and R. Ratcliff's (1978) diffusion model were evaluated using data from a signal detection task. Dependent variables included response probabilities, reaction times for correct and error responses, and shapes of reaction-time distributions. The diffusion model accounted for all aspects of the data, including error reaction times that had previously been a problem for all response-time models. The connectionist models accounted for many aspects of the data adequately, but each failed to a greater or lesser degree in important ways except for one model that was similar to the diffusion model. The findings advance the development of the diffusion model and show that the long tradition of reaction-time research and theory is a fertile domain for development and testing of connectionist assumptions about how decisions are generated over time.
Response inhibition is an important act of control in many domains of psychology and neuroscience. It is often studied in a stop-signal task that requires subjects to inhibit an ongoing action in response to a stop signal. Performance in the stop-signal task is understood as a race between a go process that underlies the action and a stop process that inhibits the action. Responses are inhibited if the stop process finishes before the go process. The finishing time of the stop process is not directly observable; a mathematical model is required to estimate its duration. developed an independent race model that is widely used for this purpose. We present a general race model that extends the independent race model to account for the role of choice in go and stop processes, and a special race model that assumes each runner is a stochastic accumulator governed by a diffusion process. We apply the models to 2 data sets to test assumptions about selective influence of capacity limitations on drift rates and strategies on thresholds, which are largely confirmed. The model provides estimates of distributions of stop-signal response times, which previous models could not estimate. We discuss implications of viewing cognitive control as the result of a repertoire of acts of control tailored to different tasks and situations.
Most models of recognition memory rely on a strength/familiarity-based signal detection account that assumes that the processes giving rise to a confidence judgment are the same as those giving rise to an old-new decision. Confidence is assumed to be scaled directly from the perceived familiarity of a probe. This assumption was tested in 2 experiments that examine the shape of confidence-based z receiver operating characteristic (zROC) curves under different levels of response bias induced by changing stimulus probabilities (Experiment 1) and payoffs (Experiment 2). Changes in the shape of the zROC curves with bias indicate that confidence is not scaled directly from perceived familiarity or likelihood. A model of information accumulation in recognition memory is proposed that can account for the observed effects.
Among the most valuable tools in behavioral science is statistically fitting mathematical models of cognition to data-response time distributions, in particular. However, techniques for fitting distributions vary widely, and little is known about the efficacy of different techniques. In this article, we assess several fitting techniques by simulating six widely cited models of response time and using the fitting procedures to recover model parameters. The techniques include the maximization of likelihood and least squares fits of the theoretical distributions to different empirical estimates of the simulated distributions. Arunning example is used to illustrate the different estimation and fittingprocedures. The simulation studies reveal that empirical density estimates are biased even for very large sample sizes. Some fitting techniques yield more accurate and less variable parameter estimates than do others. Methods that involve least squares fits to density estimates generally yield very poor parameter estimates.The importance ofconsidering the entire response time (RT) distribution in testing formal models ofcognition is now widely appreciated. Fitting a model to mean RT alone can mask important details of the data that examination of the entire distribution would reveal, such as the behavior of fast and slow responses across the conditions ofan experiment (e.g., Heathcote, Popiel, & Mewhort, 1991), the extent of facilitation between perceptual channels (Miller, 1982), and the effects ofpractice on RT quantiles (Logan, 1992). Techniques for testing hypotheses based on the RT distribution have been developed (Townsend, 1990). In addition, the RT distribution provides an important meeting ground between theory and data; the ability ofa model to predict the observed shape ofthe RT distribution is seen as a critical test ofthat model (Luce, 1986).Many models state explicitly the characteristics ofRT by specifying it as a random variable. All ofthe information about a random variable is contained in its probability density function (density, for short) or cumulative distribution function (CDF).l The density represents the likelihood that an RT is observed within some arbitrarily small window of time, whereas the CDF represents the probability that an RT is less than or equal to some specific time. Most models ofRT predict CDFs that are ogival: monotonic, nondecreasing S-shaped functions that begin at zero and asymptote at one. The RT densities predicted by most models are, in contrast, bell shaped: nonPortions of this article were presented at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.