This paper summarizes results from 12 empirical evaluations of observational methods in education contexts. We look at the performance of three common covariate-types in observational studies where the outcome is a standardized reading or math test. They are: pretest measures, local geographic matching, and rich covariate sets with a strong theory of treatment selection. Overall, the review demonstrates that although the pretest often reduces bias in observational studies, it does not always eliminate it. Its performance depends on the pretest's correlation with treatment selection and the outcome, and whether pre-intervention trends are present. We also find that although local comparisons are prioritized for matching, its performance depends on whether comparable no-treatment cases are available. Otherwise, local comparisons may produce badly biased results. In cases where researchers have a strong theory of selection and rich covariate sets, observational methods perform well, but additional replication studies are needed. Finally, observational methods that rely on demographic covariates without a theory of selection rarely produce unbiased treatment effects. The paper Downloaded by [University of Nebraska, Lincoln] at 05:40 13 June 2016 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT 2 concludes by offering education researchers empirically-based guidance on covariate selection in observational studies.