In today’s society, mathematics is one of the most important competencies imparted in school. To improve children’s mathematical skills, existing interventions and trainings in mathematical learning address different proficiency levels and age groups, take place in different settings, can focus on a single task or a set of different tasks, be applied for different durations, and address different types of numerical content. However, when such trainings are evaluated, this often happens only insufficiently. In this review, we derive and apply four evaluation criteria in a meta-analysis of mathematical intervention literature: (i) evaluation with the actual target group, (ii) evaluation in comparison to a performance-matched control group, (iii) evaluation in comparison to a comparable alternative intervention, and (iv) separate evaluation of subcomponents in the case of multi-componential approaches. Based on these criteria, we review current intervention approaches, paying particular attention to how they were evaluated. A meta-analysis on 39 effect sizes extracted from 37 studies revealed a reliable impact of three of the above-proposed evaluation criteria on the reported efficacy of an intervention. In contrast, sample and methodological characteristics like grade level of participants or training duration were not associated with effect sizes. These data indicate that the reported efficacy of an intervention in mathematical learning may depend not only on the type of intervention conducted, but also on the thoroughness of the evaluation procedure.