“…With that said, any generalizations from this small literature are challenging for many reasons: These few studies are so methodologically diverse that effect sizes might vary systematically with aspects of the study design, such as subject sample, video topic and length, number of thought probes, thought-probe format, number of interpolated tests and their format, interpolated-test difficulty, allowing or not allowing notetaking, posttest retention interval and difficulty, and extent of subjects’ prior knowledge on the lecture topic. Future research on the effect of interpolated testing on TUTs should thus take designing-for-variation and meta-analytic approaches to estimating effect size and its robustness (e.g., Baribault et al, 2018 ; Brunswik, 1955 ; Fyfe et al, 2021 ; Greenwald et al, 1986 ; Harder, 2020 ; Landy et al, 2020 ).…”