Victor N. Keller scite author profile

We conducted preregistered replications of 28 classic and contemporary published findings, with protocols that were peer reviewed in advance, to examine variation in effect magnitudes across samples and settings. Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories. Using the conventional criterion of statistical significance ( p < .05), we found that 15 (54%) of the replications provided evidence of a statistically significant effect in the same direction as the original finding. With a strict significance criterion ( p < .0001), 14 (50%) of the replications still provided such evidence, a reflection of the extremely high-powered design. Seven (25%) of the replications yielded effect sizes larger than the original ones, and 21 (75%) yielded effect sizes smaller than the original ones. The median comparable Cohen’s ds were 0.60 for the original findings and 0.15 for the replications. The effect sizes were small (< 0.20) in 16 of the replications (57%), and 9 effects (32%) were in the direction opposite the direction of the original effect. Across settings, the Q statistic indicated significant heterogeneity in 11 (39%) of the replication effects, and most of those were among the findings with the largest overall effect sizes; only 1 effect that was near zero in the aggregate showed significant heterogeneity according to this measure. Only 1 effect had a tau value greater than .20, an indication of moderate heterogeneity. Eight others had tau values near or slightly above .10, an indication of slight heterogeneity. Moderation tests indicated that very little heterogeneity was attributable to the order in which the tasks were performed or whether the tasks were administered in lab versus online. Exploratory comparisons revealed little heterogeneity between Western, educated, industrialized, rich, and democratic (WEIRD) cultures and less WEIRD cultures (i.e., cultures with relatively high and low WEIRDness scores, respectively). Cumulatively, variability in the observed effect sizes was attributable more to the effect being studied than to the sample or setting in which it was studied.

show abstract

Many Labs 2: Investigating Variation in Replicability Across Sample and Setting

Klein¹,

Vianello²,

Hasselman³

et al. 2018

Preprint

161

View full text Add to dashboard Cite

We conducted preregistered replications of 28 classic and contemporary published findings with protocols that were peer reviewed in advance to examine variation in effect magnitudes across sample and setting. Each protocol was administered to approximately half of 125 samples and 15,305 total participants from 36 countries and territories. Using conventional statistical significance (p < .05), fifteen (54%) of the replications provided evidence in the same direction and statistically significant as the original finding. With a strict significance criterion (p < .0001), fourteen (50%) provide such evidence reflecting the extremely high powered design. Seven (25%) of the replications had effect sizes larger than the original finding and 21 (75%) had effect sizes smaller than the original finding. The median comparable Cohen’s d effect sizes for original findings was 0.60 and for replications was 0.15. Sixteen replications (57%) had small effect sizes (< .20) and 9 (32%) were in the opposite direction from the original finding. Across settings, 11 (39%) showed significant heterogeneity using the Q statistic and most of those were among the findings eliciting the largest overall effect sizes; only one effect that was near zero in the aggregate showed significant heterogeneity. Only one effect showed a Tau > 0.20 indicating moderate heterogeneity. Nine others had a Tau near or slightly above 0.10 indicating slight heterogeneity. In moderation tests, very little heterogeneity was attributable to task order, administration in lab versus online, and exploratory WEIRD versus less WEIRD culture comparisons. Cumulatively, variability in observed effect sizes was more attributable to the effect being studied than the sample or setting in which it was studied.

show abstract

How (and Whether) to Teach Undergraduates About the Replication Crisis in Psychological Science

Chopik

Bremner

Defever

et al. 2018

Teaching of Psychology

View full text Add to dashboard Cite

Over the past 10 years, crises surrounding replication, fraud, and best practices in research methods have dominated discussions in the field of psychology. However, no research exists examining how to communicate these issues to undergraduates and what effect this has on their attitudes toward the field. We developed and validated a 1-hr lecture communicating issues surrounding the replication crisis and current recommendations to increase reproducibility. Pre- and post-lecture surveys suggest that the lecture serves as an excellent pedagogical tool. Following the lecture, students trusted psychological studies slightly less but saw greater similarities between psychology and natural science fields. We discuss challenges for instructors taking the initiative to communicate these issues to undergraduates in an evenhanded way.

show abstract

Registered Replication Report: Dijksterhuis and van Knippenberg (1998)

O'Donnell¹,

Nelson²,

Ackermann³

et al. 2018

Perspect Psychol Sci

View full text Add to dashboard Cite

Dijksterhuis and van Knippenberg (1998) reported that participants primed with a category associated with intelligence ("professor") subsequently performed 13% better on a trivia test than participants primed with a category associated with a lack of intelligence ("soccer hooligans"). In two unpublished replications of this study designed to verify the appropriate testing procedures, Dijksterhuis, van Knippenberg, and Holland observed a smaller difference between conditions (2%-3%) as well as a gender difference: Men showed the effect (9.3% and 7.6%), but women did not (0.3% and -0.3%). The procedure used in those replications served as the basis for this multilab Registered Replication Report. A total of 40 laboratories collected data for this project, and 23 of these laboratories met all inclusion criteria. Here we report the meta-analytic results for those 23 direct replications (total N = 4,493), which tested whether performance on a 30-item general-knowledge trivia task differed between these two priming conditions (results of supplementary analyses of the data from all 40 labs, N = 6,454, are also reported). We observed no overall difference in trivia performance between participants primed with the "professor" category and those primed with the "hooligan" category (0.14%) and no moderation by gender.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Victor N. Keller

Many Labs 2: Investigating Variation in Replicability Across Samples and Settings

Many Labs 2: Investigating Variation in Replicability Across Sample and Setting

How (and Whether) to Teach Undergraduates About the Replication Crisis in Psychological Science

Registered Replication Report: Dijksterhuis and van Knippenberg (1998)

Contact Info

Product

Resources

About