This article addresses issues related to whether null randomised control trial (RCT) findings can by themselves be a secure indicator of programme failure. This is done by drawing on the findings of the evaluation of the Integrated Group Reading (IGR) programme using a number of teacher case studies. The case studies illustrate how the same intervention can be implemented differently in local circumstances, with different outcomes. The different ways in which IGR was implemented reflect how teachers experienced the pressures of the national curriculum, their attitudes to the IGR approach to reading, the school ethos and the resources and support available-and point to how IGR use might be enhanced to result in more significant reading gains. The article argues that in addition to the statistical findings, evaluators ought to pay attention to the context in which a programme is implemented, especially when it comes to complex interventions trialled in real classrooms. It is also concluded that it is preferable to avoid asking whether a programme works or not for all, and under any circumstances. A focus on the different ways that programmes work under different circumstances, and when implemented by different people, is a more useful perspective. This might not provide the certainty that policy-makers would likely opt for, but it captures better the complexity associated with teaching programme evaluation.