Although evaluators often use an interrupted time series (ITS) design to test hypotheses about program effects, there are few empirical tests of the design's validity. We take a randomized experiment on an educational topic and compare its effects to those from a comparative ITS (CITS) design that uses the same treatment group as the experiment but a nonequivalent comparison group that is assessed at six time points before treatment. We estimate program effects with and without matching of the comparison schools, and we also systematically vary the number of pretest time points in the analysis. CITS designs produce impact estimates that are extremely close to the experimental benchmarks and, as implemented here, do so equally well with and without matching. Adding time points provides an advantage so long as the pretest trend differences in the treatment and comparison groups are correctly modeled. Otherwise, more time points can increase bias.
This article implies that sharp inferences to large populations from small experiments are difficult even with probability sampling. Features of random samples should be kept in mind when evaluating the extent to which results from experiments conducted on nonrandom samples might generalize.
We explore the conditions under which short, comparative interrupted time-series (CITS) designs represent valid alternatives to randomized experiments in educational evaluations. To do so, we conduct three within-study comparisons, each of which uses a unique data set to test the validity of the CITS design by comparing its causal estimates to those from a randomized controlled trial (RCT) that shares the same treatment group. The degree of correspondence between RCT and CITS estimates depends on the observed pretest time trend differences and how they are modeled. Where the trend differences are clear and can be easily modeled, no bias results; where the trend differences are more volatile and cannot be easily modeled, the degree of correspondence is more mixed, and the best results come from matching comparison units on both pretest and demographic covariates.
Short comparative interrupted times series (CITS) designs are increasingly being used in education research to assess the effectiveness of school-level interventions. These designs can be implemented relatively inexpensively, often drawing on publicly available data on aggregate school performance. However, the validity of this approach hinges on a variety of assumptions and design decisions that are not clearly outlined in the literature. This article aims to serve as a practice guide for applied researchers when deciding how and whether to use this approach. We begin by providing an overview of the assumptions needed to estimate causal effects using school-level data, common threats to validity faced in practice and what effects can and cannot be estimated using school-level data. We then examine two analytic decisions researchers face in practice when implementing the design: correctly modeling the pretreatment functional form, which is modeling the preintervention trend, and selecting comparison cases. We then illustrate the use of this design in practice drawing on data from the implementation of the school improvement grant (SIG) program in Ohio. We conclude with advice for applied researchers implementing this design.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.