According to a popular belief, test takers should trust their initial instinct and retain their initial responses when they have the opportunity to review test items. More than 80 years of empirical research on item review, however, has contradicted this belief and shown minor but consistently positive score gains for test takers who changed answers they found to be incorrect during review. This study reanalyzed the problem of the benefits of answer changes using item response theory modeling of the probability of an answer change as a function of the test taker's ability level and the properties of items. Our empirical results support the popular belief and reveal substantial losses due to changing initial responses for all ability levels. Both the contradiction of the earlier research and support of the popular belief are explained as a manifestation of Simpson's paradox in statistics.
Items on test score scales located at and below the Proficient cut score define the content area knowledge and skills required to achieve proficiency. Alternately, examinees who perform at the Proficient level on a test can be expected to be able to demonstrate that they have mastered most of the knowledge and skills represented by the items at and below the Proficient cut score. It is important that these items define intended knowledge and skills, especially increasing levels of knowledge and skills, on tests that are intended to portray achievement growth across grade levels. Previous studies show that coherent definitions of growth occur often as a result of good fortune rather than by design. In this paper, we use grades 3, 4, and 5 mathematics tests from a state assessment program to examine how well (a) descriptors for Proficient performance define achievement growth across grades, and (b) the knowledge and skill demands of test items that define Proficient performance at each grade level may or may not define achievement growth coherently. Our purpose is to demonstrate (a) the results of one state assessment program's first attempt to train item writers to hit assigned proficiency level targets, and (b) how those efforts support and undermine coherent inferences about what it means to achieve Proficient performance from one grade to the next. Item writers’ accuracy in hitting proficiency level targets and resulting inferences about achievement growth are mixed but promising.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.