We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries. T he lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on 'statistically significant' findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (for example, multiple testing, P-hacking, publication bias and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating statistically significant findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems.For fields where the threshold for defining statistical significance for new discoveries is P < 0.05, we propose a change to P < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called significant but do not meet the new threshold should instead be called suggestive. While statisticians have known the relative weakness of using P ≈ 0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new 1,2 , a critical mass of researchers now endorse this change.We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (for example, genomics and high-energy physics research; see the 'Potential objections' section below).We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P values. However, changing the P value threshold is simple, aligns with the training undertaken by many researchers, and might quickly achieve broad acceptance.
Experimental economists are leaving the reservation. They are recruiting subjects in the field rather than in the classroom, using field goods rather than induced valuations, and using field context rather than abstract terminology in instructions. We argue that there is something methodologically fundamental behind this trend. Field experiments differ from laboratory experiments in many ways. Although it is tempting to view field experiments as simply less controlled variants of laboratory experiments, we argue that to do so would be to seriously mischaracterize them. What passes for "control" in laboratory experiments might in fact be precisely the opposite if it is artificial to the subject or context of the task. We propose six factors that can be used to determine the field context of an experiment: the nature of the subject pool, the nature of the information that the subjects bring to the task, the nature of the commodity, the nature of the task or trading rules applied, the nature of the stakes, and the environment that subjects operate in.
conomists have increasingly turned to the experimental model of the physical sciences as a method to understand human behavior. Peerreviewed articles using the methodology of experimental economics were almost nonexistent until the mid-1960s and surpassed 50 annually for the first time in 1982; and by 1998, the number of experimental papers published per year exceeded 200 (Holt, 2006). Lab experiments allow the investigator to influence the set of prices, budget sets, information sets, and actions available to actors, and thus measure the impact of these factors on behavior within the context of the laboratory. The allure of the laboratory experimental method in economics is that, in principle, it provides ceteris paribus observations of individual economic agents, which are otherwise difficult to obtain. A critical assumption underlying the interpretation of data from many laboratory experiments is that the insights gained in the lab can be extrapolated to the world beyond, a principle we denote as generalizability. For physical laws and processes like gravity, photosynthesis, and mitosis, the evidence supports the idea that what happens in the lab is equally valid in the broader world. The American astronomer Harlow Shapley (1964, p. 43), for instance, noted that "as far as we can tell, the same physical laws prevail everywhere." In this manner, astronomers are able to infer the quantity of certain gases in the Sunflower galaxy, for example, from observations of signature wavelengths of light emitted from that galaxy.
A critical question facing experimental economists is whether behavior inside the laboratory is a good indicator of behavior outside the laboratory. To address that question, we build a model in which the choices that individuals make depend not just on financial implications, but also on the nature and extent of scrutiny by others, the particular context in which a decision is embedded, and the manner in which participants and tasks are selected. We present empirical evidence demonstrating the importance of these various factors. To the extent that lab and naturally occurring environments systematically differ on any of these dimensions, the results obtained inside and outside the lab need not correspond. Focusing on experiments designed to measure social preferences, we discuss the extent to which the existing laboratory results generalize to naturally-occurring markets. We summarize cases where the lab may understate the importance of social preferences as well as instances in which the lab might exaggerate their importance. We conclude by emphasizing the importance of interpreting laboratory and field data through the lens of theory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.