52% Yes, a signiicant crisis 3% No, there is no crisis 7% Don't know 38% Yes, a slight crisis 38% Yes, a slight crisis 1,576 RESEARCHERS SURVEYED M ore than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments. Those are some of the telling figures that emerged from Nature's survey of 1,576 researchers who took a brief online questionnaire on reproducibility in research. The data reveal sometimes-contradictory attitudes towards reproduc-ibility. Although 52% of those surveyed agree that there is a significant 'crisis' of reproducibility, less than 31% think that failure to reproduce published results means that the result is probably wrong, and most say that they still trust the published literature. Data on how much of the scientific literature is reproducible are rare and generally bleak. The best-known analyses, from psychology 1 and cancer biology 2 , found rates of around 40% and 10%, respectively. Our survey respondents were more optimistic: 73% said that they think that at least half of the papers in their field can be trusted, with physicists and chemists generally showing the most confidence. The results capture a confusing snapshot of attitudes around these issues, says Arturo Casadevall, a microbiologist at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. "At the current time there is no consensus on what reproducibility is or should be. " But just recognizing that is a step forward, he says. "The next step may be identifying what is the problem and to get a consensus. "
We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries. T he lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on 'statistically significant' findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (for example, multiple testing, P-hacking, publication bias and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating statistically significant findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems.For fields where the threshold for defining statistical significance for new discoveries is P < 0.05, we propose a change to P < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called significant but do not meet the new threshold should instead be called suggestive. While statisticians have known the relative weakness of using P ≈ 0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new 1,2 , a critical mass of researchers now endorse this change.We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (for example, genomics and high-energy physics research; see the 'Potential objections' section below).We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P values. However, changing the P value threshold is simple, aligns with the training undertaken by many researchers, and might quickly achieve broad acceptance.
Another social science looks at itself Experimental economists have joined the reproducibility discussion by replicating selected published experiments from two top-tier journals in economics. Camerer et al. found that two-thirds of the 18 studies examined yielded replicable estimates of effect size and direction. This proportion is somewhat lower than unaffiliated experts were willing to bet in an associated prediction market, but roughly in line with expectations from sample sizes and P values. Science , this issue p. 1433
Here we provide further details on the replications, the estimation of standardized effect sizes and complementary replicability indicators, the implementation of the prediction markets and surveys, the comparison of prediction market beliefs, survey beliefs, and replication outcomes, the comparison of reproducibility indicators to experimental economics and the psychological sciences, and additional results and data for the individual studies and markets. The code used for the estimation of replication power, standardized effect sizes, all complementary replication indicators, and all results is posted at OSF (https://osf.io/pfdyw/). Replications Inclusion criteriaWe replicated 21 experimental studies in the social sciences published between 2010 and 2015 in Nature and Science. We included all studies that fulfilled our inclusion criteria for:(i) the journal and time period, (ii) the type of experiment, (iii) the subjects included in the experiment, (iv) the equipment and materials needed to implement the experiment, and (v) the results reported in the experiment. We did not exclude studies that had already been subject to a replication, as this could affect the representativity of the included studies. We define and discuss the five inclusion criteria below. Journal and time period: We included experimental studies published in Nature andScience between 2010 and 2015. The reason for focusing on these two journals is that they are typically considered the two most prestigious general science journals. Articles published in these journals are considered exciting, innovative, and important, which is also reflected in their high impact factors. * Number of observations; number of individuals provided in parenthesis. † Replicated; significant effect (p < 0.05) in the same direction as in original study. ‡ Statistical power to detect 50% of the original effect size r. § Relative standardized effect size. * Belief about the probability of replicating in stage 1 (90% power to detect 75% of the original effect size).† Predicted added probability of replicating in stage 2 (90% power to detect 50% of the original effect size) compared to stage 1. * Mean number of tokens (points) invested per transaction. † Mean number of shares bought or sold per transaction.
A key aspect of human behaviour is cooperation. We tend to help others even if costs are involved. We are more likely to help when the costs are small and the benefits for the other person significant. Cooperation leads to a tension between what is best for the individual and what is best for the group. A group does better if everyone cooperates, but each individual is tempted to defect. Recently there has been much interest in exploring the effect of costly punishment on human cooperation. Costly punishment means paying a cost for another individual to incur a cost. It has been suggested that costly punishment promotes cooperation even in non-repeated games and without any possibility of reputation effects. But most of our interactions are repeated and reputation is always at stake. Thus, if costly punishment is important in promoting cooperation, it must do so in a repeated setting. We have performed experiments in which, in each round of a repeated game, people choose between cooperation, defection and costly punishment. In control experiments, people could only cooperate or defect. Here we show that the option of costly punishment increases the amount of cooperation but not the average payoff of the group. Furthermore, there is a strong negative correlation between total payoff and use of costly punishment. Those people who gain the highest total payoff tend not to use costly punishment: winners don't punish. This suggests that costly punishment behaviour is maladaptive in cooperation games and might have evolved for other reasons.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.