The American College Testing Program Nedelsky (1954) and Angoff (1971) have suggested procedures for establishing a cutting score based on raters' judgments about the likely performance of minimally competent examinees on each item in a test. In this paper generalizability theory is used to characterize and quantify expected variance in cutting scores resulting from each procedure. Experimental test data are used to illustrate this approach and to compare the two procedures. Consideration is also given to the impact of rater disagreement on some issues of measurement reliability or dependability. Results suggest that the differences between the Nedel sky and Angoff procedures may be of greater consequence than their apparent similarities. In particular, the restricted nature of the Nedelsky (inferred) probability scale may constitute a basis for seriously questioning the applicability of this procedure in certain contexts. Currently there is considerable debate concerning procedures for setting passing standards, or cutting scores, when scores on tests are used to make certain types of decisions (see, for example, National Council on Measurement in Education, 1978). Meskauskas (1976), Buck (1977), and Zieky and Livingston (1977), among others, have reviewed some current procedures for establishing cutting scores. For the most part these procedures can be grouped into two categories-procedures that use raters' subjective judgments and procedures that use examinee scores on the test itself and/or some criterion measure. The latter category of procedures is not discussed in this paper; rather, this paper examines two procedures suggested by Nedelsky (1954) and Angoff (1971) for establishing cutting scores based upon raters' judgments. Nedelsky and Angoff Procedures Both of these procedures require judgments by raters concerning the performance of hypothetical &dquo;minimally competent&dquo; examinees on each item of a test. Using Nedelsky's procedure, raters are asked to identify, for each item, those distractors that a minimally competent examinee would eliminate as incorrect. The reciprocal of the number of remaining alternatives (including the correct answer) serves as an estimate of the probability that a &dquo;minimally competent&dquo; examinee would get the item correct. In Angoff's prooedure, raters simply provide an estimate of the item probabilities with
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.