Evaluation is an important component of developing educational software. Ideally, such evaluation quantifies and qualifies the effects of a new educational intervention on the learning process and outcomes. Conducting meaningful and rigorous educational evaluation is difficult, however. Challenges include defining and measuring educational outcomes, accounting for media effects, coping with practical problems in designing studies, and asking the right research questions. Practical considerations that make the design of evaluation studies difficult include confounding, potentially small effect sizes, contamination effects, and ethics. Two distinct approaches to evaluation are objectivist and subjectivist. These two complement each other in describing the whole range of effects a new educational program can have. Objectivist demonstration studies should be preceded by measurement studies that assess the reliability and validity of the evaluation instrument(s) used. Many evaluation studies compare the performance of learners who are exposed to either the new program or a more traditional approach. However, this method is problematic because test or exam performance is often a weak indicator of competence and may fail to capture important nuances in outcomes. Subjectivist studies are more qualitative in nature and may provide insights complementary to those gained with objectivist studies. Several published examples are used in this article to illustrate different evaluation methods. Readers are encouraged to contemplate a wide range of evaluation study designs and explore increasingly complex questions when evaluating educational software.Dr. Schleyer is Associate Professor and