Several methods have been proposed for estimating the reliability of a criterion-referenced test. This paper describes and compares seven procedures which can be applied to the more general case of proficiency tests that are scored with latent structure models. Results suggest that the predictive estimate is the most accurate of the procedures.EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1981, 41 WILCOX (1979a) describes a proficiency test as consisting of n skills with k ? 1 items per skill. Currently such tests are being constructed by various city and state agencies throughout the United States for the purpose of aiding in the decision about whether an examinee will be awarded a high school diploma or be advanced to the next grade level.For a specific examinee, let r represent the proportion of skills in the domain of skills that he/she has acquired. Based on the observed responses of the examinee, let S be an estimate of ~. If f is greater than or equal to some predetermined constant, ro, the examinee is said to have passed the test; otherwise, he/she fails. For comments on how to determine ro, see Hambleton, et al. 1978), Glass (1978), Wilcox (1980). One approach to characterizing such tests is to estimate the probability that for the typical examinee, ~ > ro when ro or that r < ~o when ~ < roo For the special case of one item per skill (i.e., k = 1), these probabilities can be estimated when the binomial error model (e.g., Lord and Novick, 1968, Chapter 23) or the Dirichlet-multi-