Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key results are robust across estimation methods, data sets, and demographic subgroups. This article makes the case for prioritizing both explicit replications and, especially, within-study robustness checks in developmental psychology. It provides evidence on variation in effect sizes in developmental studies and documents strikingly different replication and robustness-checking practices in a sample of journals in developmental psychology and a sister behavioral science-applied economics. Our goal is not to show that any one behavioral science has a monopoly on best practices, but rather to show how journals from a related discipline address vital concerns of replication and generalizability shared by all social and behavioral sciences. We provide recommendations for promoting graduate training in replication and robustness-checking methods and for editorial policies that encourage these practices. Although some of our recommendations may shift the form and substance of developmental research articles, we argue that they would generate considerable scientific benefits for the field. (Rosenthal & Jacobson, 1968). Just before the school year began, each of the school's 18 teachers was given the names of about five students who, based on a test administered several months before, were alleged to be "academic spurters"-children with exceptional academic promise. In fact these children had been chosen at random from the much larger set of tested students. An IQ test administered at the end of the academic year showed that, among other results, first and second graders in the "spurter" group had larger intellectual gains than did their peers. Teachers described these spurters as having a better chance of being successful in later life and as being happier, more curious, and more interesting than were other children. These results, published in the 1968 book Pygmalion in the Classroom, were widely discussed and bitterly disputed and inspired changes in classroom practice.
KeywordsReplication studies quickly appeared, some of which attempted to exactly reproduce the original Pygmalion study conditions, while others explored the robustness of the original results to variations in the context in which the original experiment was conducted. Some of these studies replicated the original Pygmalion effects, while others did not. In 1984, the 18 high-quality published studies on this topic were subjected to a meta-analysis (Raudenbush, 1984). The results showed a clear pattern in which studies that misled teachers before they had much contact with students produced much larger effects (d ϭ ϩ0.23), on average, than cognitive dissonance-invoking studies that tried to mislead teachers aft...