Jerzy Neyman declared, “Models become plausible by repetition.” What he called repetition is now known as model replication and entails applying the same statistical model to data from different studies. Typically, researchers rely on goodness-of-fit – indexing the degree to which the model represents the data – as evidence of successful replication. But we demonstrate that goodness-of-fit testing fails to capture important (dis)similarities between studies. As an alternative, we propose prior predictive similarity checking: a method for rigorous comparison of the data patterns and/or model parameters from different studies. We illustrate this method using psychopathology data from the National Comorbidity Survey (NCS) and its purported replication, the NCS-R. Both data sets yielded excellent fit, but often failed to pass our more demanding similarity checking technique. We conclude with recommendations for applied model replication research.