Although replication is a central tenet of science, direct replications are rare in psychology. This research tested variation in the replicability of thirteen classic and contemporary effects across 36 independent samples totaling 6,344 participants. In the aggregate, ten effects replicated consistently.One effect -imagined contact reducing prejudice -showed weak support for replicability. And two effects -flag priming influencing conservatism and currency priming influencing system justification -did not replicate. We compared whether the conditions such as lab versus online or U.S. versus international sample predicted effect magnitudes. By and large they did not. The results of this small sample of effects suggest that replicability is more dependent on the effect itself than on the sample and setting used to investigate the effect. Word Count = 121 words Many Labs 3 Investigating variation in replicability: A "Many Labs" Replication ProjectReplication is a central tenet of science; its purpose is to confirm the accuracy of empirical findings, clarify the conditions under which an effect can be observed, and estimate the true effect size (Brandt et al., 2013; Open Science Collaboration, 2012. Successful replication of an experiment requires the recreation of the essential conditions of the initial experiment. This is often easier said than done. There may be an enormous number of variables influencing experimental results, and yet only a few tested. In the behavioral sciences, many effects have been observed in one cultural context, but not observed in others. Likewise, individuals within the same society, or even the same individual at different times (Bodenhausen, 1990), may differ in ways that moderate any particular result.Direct replication is infrequent, resulting in a published literature that sustains spurious findings (Ioannidis, 2005) and a lack of identification of the eliciting conditions for an effect. While there are good epistemological reasons for assuming that observed phenomena generalize across individuals and contexts in the absence of contrary evidence, the failure to directly replicate findings is problematic for theoretical and practical reasons. Failure to identify moderators and boundary conditions of an effect may result in overly broad generalizations of true effects across situations (Cesario, 2013) or across individuals (Henrich, Heine, & Norenzayan, 2010). Similarly, overgeneralization may lead observations made under laboratory observations to be inappropriately extended to ecological contexts that differ in important ways (Henry, MacLeod, Phillips, & Crawford, 2004). Practically, attempts to closely replicate research findings can reveal important differences in what is considered a direct replication (Schimdt, 2009), thus leading to refinements of the initial theory (e.g., Aronson, 1992, Greenwald et al., 1986. Close replication can also lead to Many Labs 4 the clarification of tacit methodological knowledge that is necessary to elicit the effect of interest (Collins,...
We conducted preregistered replications of 28 classic and contemporary published findings, with protocols that were peer reviewed in advance, to examine variation in effect magnitudes across samples and settings. Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories. Using the conventional criterion of statistical significance ( p < .05), we found that 15 (54%) of the replications provided evidence of a statistically significant effect in the same direction as the original finding. With a strict significance criterion ( p < .0001), 14 (50%) of the replications still provided such evidence, a reflection of the extremely high-powered design. Seven (25%) of the replications yielded effect sizes larger than the original ones, and 21 (75%) yielded effect sizes smaller than the original ones. The median comparable Cohen’s ds were 0.60 for the original findings and 0.15 for the replications. The effect sizes were small (< 0.20) in 16 of the replications (57%), and 9 effects (32%) were in the direction opposite the direction of the original effect. Across settings, the Q statistic indicated significant heterogeneity in 11 (39%) of the replication effects, and most of those were among the findings with the largest overall effect sizes; only 1 effect that was near zero in the aggregate showed significant heterogeneity according to this measure. Only 1 effect had a tau value greater than .20, an indication of moderate heterogeneity. Eight others had tau values near or slightly above .10, an indication of slight heterogeneity. Moderation tests indicated that very little heterogeneity was attributable to the order in which the tasks were performed or whether the tasks were administered in lab versus online. Exploratory comparisons revealed little heterogeneity between Western, educated, industrialized, rich, and democratic (WEIRD) cultures and less WEIRD cultures (i.e., cultures with relatively high and low WEIRDness scores, respectively). Cumulatively, variability in the observed effect sizes was attributable more to the effect being studied than to the sample or setting in which it was studied.
Implicit preferences are malleable, but does that change last? We tested nine interventions (eight real and one sham) to reduce implicit racial preferences over time. In two studies with a total of 6,321 participants, all nine interventions immediately reduced implicit preferences. However, none were effective after a delay of several hours to several days. We also found that these interventions did not change explicit racial preferences and were not reliably moderated by motivations to respond without prejudice. Short-term malleability in implicit preferences does not necessarily lead to long-term change, raising new questions about the flexibility and stability of implicit preferences.Word Count: 100 Keywords: attitudes, racial prejudice, implicit social cognition, malleability, Implicit Association Test Full CitationLai, C. K., Skinner, A. L., Cooley, E., Murrar, S., Brauer, M., Devos, T., Calanchini, J., Xiao, Y. J., Pedram, C., Marshburn, C. K., Simon, S., Blanchar, J. C., Joy-Gaba, J. A., Conway, J., Redford, L., Klein, R. A., Roussos, G., Schellhaas, F. M. H., Burns, M., Hu, X., McLean, M. C., Axt, J. R., Asgari, S., Schmidt, K., Rubinstein., R, Marini, M., Rubichi, S., Shin,. J. L., & Nosek, B. A. (2016). Reducing implicit racial preferences: II. Intervention effectiveness across time. Journal of Experimental Psychology: General, 145, 1001-1016. REDUCING IMPLICIT RACIAL PREFERENCES 3 Reducing Implicit Racial Preferences: II. Intervention Effectiveness Across TimeEarly theories of implicit social cognition suggested that implicit associations were largely stable. These claims were supported by evidence that changes in conscious belief did not lead to corresponding changes in implicit associations (e.g., Devine, 1989;Wilson, Lindsey, & Schooler, 2000). The psychologist John Bargh referred to the stability of implicit cognitions as the "cognitive monster": "Once a stereotype is so entrenched that it becomes activated automatically, there is really little that can be done to control its influence" (p. 378, Bargh, 1999). This dominant view has changed over the past fifteen years to one of implicit malleability, with many studies finding that implicit associations are sensitive to lab-based interventions (for reviews, see Blair, 2002;Gawronski & Bodenhausen 2006;Lai, Hoffman, & Nosek, 2013). These interventions vary greatly in approach. In one, for example, participants are exposed to images of people who defy stereotypes (e.g., admired Black people / hated White people; Joy-Gaba & Nosek, 2010). In another, participants are given goals to override implicit biases (e.g., Mendoza, Gollwitzer, & Amodio, 2010;Stewart & Payne, 2008).In most of the research on implicit association change, the short-term malleability of associations is tested by administering an implicit measure immediately after the intervention. Studies examining long-term change in implicit associations are rare. In a meta-analysis on experiments to change implicit associations (Forscher, Lai, et al., 2016), only 22 (3.7%) of 585 studies ...
The university participant pool is a key resource for behavioral research, and data quality is believed to vary over the course of the academic semester. This crowdsourced project examined time of semester variation in 10 known effects, 10 individual differences, and 3 data quality indicators over the course of the academic semester in 20 participant pools (N = 2,696) and with an online sample (N = 737). Weak time of semester effects were observed on data quality indicators, participant sex, and a few individual differences-conscientiousness, mood, and stress. However, there was little evidence for time of semester qualifying experimental or correlational effects. The generality of this evidence is unknown because only a subset of the tested effects demonstrated evidence for the original result in the whole sample. Mean characteristics of pool samples change slightly during the semester, but these data suggest that those changes are mostly irrelevant for detecting effects. Word count = 151Keywords: social psychology; cognitive psychology; replication; participant pool; individual differences; sampling effects; situational effects 4 Many Labs 3: Evaluating participant pool quality across the academic semester via replication University participant pools provide access to participants for a great deal of published behavioral research. The typical participant pool consists of undergraduates enrolled in introductory psychology courses that require students to complete some number of experiments over the course of the academic semester. Common variations might include using other courses to recruit participants or making study participation an option for extra credit rather than a pedagogical requirement. Research-intensive universities often have a highly organized participant pool with a participant management system for signing up for studies and assigning credit. Smaller or teaching-oriented institutions often have more informal participant pools that are organized ad hoc each semester or for an individual class.To avoid selection bias based on study content, most participant pools have procedures to avoid disclosing the content or purpose of individual studies during the sign-up process.However, students are usually free to choose the time during the semester that they sign up to complete the studies. This may introduce a selection bias in which data collection on different dates occurs with different kinds of participants, or in different situational circumstances (e.g., the carefree semester beginning versus the exam-stressed semester end).If participant characteristics differ across time during the academic semester, then the results of studies may be moderated by the time at which data collection occurs. Indeed, among behavioral researchers there are widespread intuitions, superstitions, and anecdotes about the "best" time to collect data in order to minimize error and maximize power. It is common, for example, to hear stories of an effect being obtained in the first part of the semester that then "d...
We conducted preregistered replications of 28 classic and contemporary published findings with protocols that were peer reviewed in advance to examine variation in effect magnitudes across sample and setting. Each protocol was administered to approximately half of 125 samples and 15,305 total participants from 36 countries and territories. Using conventional statistical significance (p < .05), fifteen (54%) of the replications provided evidence in the same direction and statistically significant as the original finding. With a strict significance criterion (p < .0001), fourteen (50%) provide such evidence reflecting the extremely high powered design. Seven (25%) of the replications had effect sizes larger than the original finding and 21 (75%) had effect sizes smaller than the original finding. The median comparable Cohen’s d effect sizes for original findings was 0.60 and for replications was 0.15. Sixteen replications (57%) had small effect sizes (< .20) and 9 (32%) were in the opposite direction from the original finding. Across settings, 11 (39%) showed significant heterogeneity using the Q statistic and most of those were among the findings eliciting the largest overall effect sizes; only one effect that was near zero in the aggregate showed significant heterogeneity. Only one effect showed a Tau > 0.20 indicating moderate heterogeneity. Nine others had a Tau near or slightly above 0.10 indicating slight heterogeneity. In moderation tests, very little heterogeneity was attributable to task order, administration in lab versus online, and exploratory WEIRD versus less WEIRD culture comparisons. Cumulatively, variability in observed effect sizes was more attributable to the effect being studied than the sample or setting in which it was studied.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.