To investigate the effect of innovations in the teaching–learning environment, researchers often compare study results from different cohorts across years. However, variance in scores can be attributed to both random fluctuation and systematic changes due to the innovation, complicating cohort comparisons. In the present study, we illustrate how using information about the variation in course grades over time can help researchers and practitioners better compare the grades and pass rates of different cohorts of students. To this end, all 375,093 grades from all 40,087 first‐year students at a Dutch university during a period of six consecutive years were examined. Overall, about 17% of the variation in grades could be attributed to random variation between years and courses. With respect to passing courses, this percentage was almost 40%. Nonsignificant improvements in grades could be flagged as highly significant when this is ignored, thus leading to an overrepresentation of significant effects in educational literature. As a consequence, too many educational innovations are claimed to be effective.