52% Yes, a signiicant crisis 3% No, there is no crisis 7% Don't know 38% Yes, a slight crisis 38% Yes, a slight crisis 1,576 RESEARCHERS SURVEYED M ore than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments. Those are some of the telling figures that emerged from Nature's survey of 1,576 researchers who took a brief online questionnaire on reproducibility in research. The data reveal sometimes-contradictory attitudes towards reproduc-ibility. Although 52% of those surveyed agree that there is a significant 'crisis' of reproducibility, less than 31% think that failure to reproduce published results means that the result is probably wrong, and most say that they still trust the published literature. Data on how much of the scientific literature is reproducible are rare and generally bleak. The best-known analyses, from psychology 1 and cancer biology 2 , found rates of around 40% and 10%, respectively. Our survey respondents were more optimistic: 73% said that they think that at least half of the papers in their field can be trusted, with physicists and chemists generally showing the most confidence. The results capture a confusing snapshot of attitudes around these issues, says Arturo Casadevall, a microbiologist at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. "At the current time there is no consensus on what reproducibility is or should be. " But just recognizing that is a step forward, he says. "The next step may be identifying what is the problem and to get a consensus. "
Over the last ten years, Oosterhof and Todorov's valence-dominance model has emerged as the most prominent account of how people evaluate faces on social dimensions. In this model, two dimensions (valence and dominance) underpin social judgments of faces. Because this model has primarily been developed and tested in Western regions, it is unclear whether these findings apply to other regions. We addressed this question by replicating Oosterhof and Todorov's methodology across 11 world regions, 41 countries, and 11,570 participants. When we used Oosterhof and Todorov's original analysis strategy, the valence-dominance model generalized across regions. When we used an alternative methodology to allow for correlated dimensions we observed much less generalization. Collectively, these results suggest that, while the valence-dominance model generalizes very well across regions when dimensions are forced to be orthogonal, regional differences are revealed when we use different extraction methods, correlate and rotate the dimension reduction solution.
Interpreting a failure to replicate is complicated by the fact that the failure could be due to the original finding being a false positive, unrecognized moderating influences between the original and replication procedures, or faulty implementation of the procedures in the replication. One strategy to maximize replication quality is involving the original authors in study design. We (N = 21 Labs and N = 2,220 participants) experimentally tested whether original author involvement improved replicability of a classic finding from Terror Management Theory (Greenberg et al., 1994). Our results were non-diagnostic of whether original author involvement improves replicability because we were unable to replicate the finding under any conditions. This suggests that the original finding was either a false positive or the conditions necessary to obtain it are not yet understood or no longer exist. Data, materials, analysis code, preregistration, and supplementary documents can be found on the OSF page: https://osf.io/8ccnw/
Gilbert et al. conclude that evidence from the Open Science Collaboration's Reproducibility Project: Psychology indicates high reproducibility, given the study methodology. Their very optimistic assessment is limited by statistical misconceptions and by causal inferences from selectively interpreted, correlational data. Using the Reproducibility Project: Psychology data, both optimistic and pessimistic conclusions about reproducibility are possible, and neither are yet warranted.A cross multiple indicators of reproducibility, the Open Science Collaboration (1) (OSC2015) observed that the original result was replicated in~40 of 100 studies sampled from three journals. Gilbert et al. (2) conclude that the reproducibility rate is, in fact, as high as could be expected, given the study methodology. We agree with them that both methodological differences between original and replication studies and statistical power affect reproducibility, but their very optimistic assessment is based on statistical misconceptions and selective interpretation of correlational data.Gilbert et al. focused on a variation of one of OSC2015's five measures of reproducibility: how often the confidence interval (CI) of the original study contains the effect size estimate of the replication study. They misstated that the expected replication rate assuming only sampling error is 95%, which is true only if both studies estimate the same population effect size and the replication has infinite sample size (3, 4). OSC2015 replications did not have infinite sample size. In fact, the expected replication rate was 78.5% using OSC2015's CI measure (see OSC2015's supplementary information, pp. 56 and 76; https://osf.io/k9rnd). By this measure, the actual replication rate was only 47.4%, suggesting the influence of factors other than sampling error alone.Within another large replication study, "Many Labs" (5) (ML2014), Gilbert et al. found that 65.5% of ML2014 studies would be within the CIs of other ML2014 studies of the same phenomenon and concluded that this reflects the maximum reproducibility rate for OSC2015. Their analysis using ML2014 is misleading and does not apply to estimating reproducibility with OSC2015's data for a number of reasons.First, Gilbert et al.'s estimates are based on pairwise comparisons between all of the replications within ML2014. As such, for roughly half of their failures to replicate, "replications" had larger effect sizes than "original studies," whereas just 5% of OSC2015 replications had replication CIs exceeding the original study effect sizes.Second, Gilbert et al. apply the by-site variability in ML2014 to OSC2015's findings, thereby arriving at higher estimates of reproducibility. However, ML2014's primary finding was that by-site variability was highest for the largest (replicable) effects and lowest for the smallest (nonreplicable) effects. If ML2014's primary finding is generalizable, then Gilbert et al.'s analysis may leverage by-site variability in ML2014's larger effects to exaggerate the effect of by-sit...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.