Standards-based score reports interpret test performance with reference to cut scores defining categories like "below basic" or "proficient" or "master." This paper first develops a conceptual framework for validity arguments supporting such interpretations, then presents three applications. Two of these are serve to introduce new standard-setting methods. Standards-based reporting has come into common use for educational assessments in the United States. Cut scores define performance levels for individual students, and the proportions of students at or above successive levels are reported for schools, states, and the entire nation. The broad appeal of such reporting is not surprising. Labels like "Basic," "Proficient," or "Advanced" seem to convey whether students are making satisfactory progress, whether schools are doing well enough, and in what subject areas students are most in need of improvement. In this standards-based era, it no longer seems sufficient to know whether annual scores are up or down; reporting in terms of quantified goals is called for to say how much better would be good enough. In the name of "ending social promotion," score thresholds are set to indicate which children need remediation and which are ready for the next grade level. And, at the individual student level, dozens of states have enacted requirements for high school exit examinations. Many of these standards have serious consequences; the potential for mischief is great if they are set capriciously. Because these cut scores define the decision rules according to which test scores are interpreted and used, the Standards for Educational and Psychological Testing state that "the validity of test interpretations may hinge on the cut scores" (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999, p. 53).
The conceptual framework lays out the logic of validity arguments in support of standards-based score interpretations, focusing on requirements that the performance standard (i.e., the characterization of examinees who surpass the cut score) be defensible both as a description and as a normative judgmentUnfortunately, standards-based score reporting and student certification may be yet another case of political expectations outpacing measurement realities. It is arguable whether many such score interpretations are technically defensible. And while enthusiasm continues unabated, current standard-setting methods have been strenuously criticized (e.g., Berk, 1995;Glass, 1978;Jaeger, Mullis, Bourque, & Shakrani, 1996; National Academy of Education Panel on the Evaluation of the NAEP Trial State Assessment, 1993;Pellegrino, Jones, & Mitchell, 1999). This paper is divided into two major sections, the first presenting a conceptual framework and the second applying that framework to three standard-setting problems. After introducing the idea of validity argument, the first section takes up the related requirements for (1) a performance standard that desc...