Determining the Length of a Criterion-Referenced Test

Wilcox, Rand R.

doi:10.1177/014662168000400402

Cited by 22 publications

(15 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One appealing feature of this method of scoring a test, particularly for the case of multiple-choice test items, is that the resulting estimate of true score is corrected for guessing without assuming guessing is at random. In addition, results reported by Wilcox (1980) suggest that for some situations, scoring proficiency tests with latent structure models might sub- stantially reduce the number of items on the test that would otherwise be needed to obtain the same level of accuracy when comparing ~ tõ o If, however, latent structure models are used to score a test, existing, single administration estimates of reliability are no longer appropriate because, as will become evident, we must estimate a multinomial distribution rather than the simpler binomial probability function. The purpose of this paper is to describe and compare various solutions to this problem.…”

mentioning

confidence: 85%

“…If f is greater than or equal to some predetermined constant, ro, the examinee is said to have passed the test; otherwise, he/she fails. For comments on how to determine ro, see Hambleton, et al 1978), Glass (1978), Wilcox (1980). One approach to characterizing such tests is to estimate the probability that for the typical examinee, ~ > ro when ro or that r < ~o when ~ < roo For the special case of one item per skill (i.e., k = 1), these probabilities can be estimated when the binomial error model (e.g., Lord and Novick, 1968, Chapter 23) or the Dirichlet-multi-…”

mentioning

confidence: 99%

“…However, for the case of k >_ 2 items per skill, a strong true-score model has not been derived that can be used to obtain point estimates of these probabilities. However, as noted by Wilcox (1980), bounds on these probabilities can be derived and estimated.…”

mentioning

confidence: 99%

See 2 more Smart Citations

The Single Administration Estimate of the Proportion of Agreement of a Proficiency Test Scored with a Latent Structure Model

Wilcox

1981

Educational and Psychological Measurement

Self Cite

View full text Add to dashboard Cite

Several methods have been proposed for estimating the reliability of a criterion-referenced test. This paper describes and compares seven procedures which can be applied to the more general case of proficiency tests that are scored with latent structure models. Results suggest that the predictive estimate is the most accurate of the procedures.EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1981, 41 WILCOX (1979a) describes a proficiency test as consisting of n skills with k ? 1 items per skill. Currently such tests are being constructed by various city and state agencies throughout the United States for the purpose of aiding in the decision about whether an examinee will be awarded a high school diploma or be advanced to the next grade level.For a specific examinee, let r represent the proportion of skills in the domain of skills that he/she has acquired. Based on the observed responses of the examinee, let S be an estimate of ~. If f is greater than or equal to some predetermined constant, ro, the examinee is said to have passed the test; otherwise, he/she fails. For comments on how to determine ro, see Hambleton, et al. 1978), Glass (1978), Wilcox (1980). One approach to characterizing such tests is to estimate the probability that for the typical examinee, ~ > ro when ro or that r < ~o when ~ < roo For the special case of one item per skill (i.e., k = 1), these probabilities can be estimated when the binomial error model (e.g., Lord and Novick, 1968, Chapter 23) or the Dirichlet-multi-

show abstract

mentioning

confidence: 85%

mentioning

confidence: 99%

See 1 more Smart Citation

The Single Administration Estimate of the Proportion of Agreement of a Proficiency Test Scored with a Latent Structure Model

Wilcox

1981

Educational and Psychological Measurement

Self Cite

View full text Add to dashboard Cite

show abstract

“…Thus the course may cover many distinct topics and involve rote learning and complex argument, calculation and non-quantitative description, lectures and laboratory work, good teaching and under-emphasis. The Rasch model, and item response theory generally, are then inappropriate (Wilcox, 1980;Burton, 2004b). Indeed, typical academic tests of the kinds discussed here conform much more to the Posey model even though the facts and ideas to be tested are usually neither independent nor equivalent in difficulty and utility.…”

Section: Two Contrasting Types Of Testmentioning

confidence: 98%

“…Indeed, individual items may differ in inherent difficulty, and student knowledge may be patchy. However, there is one particular situation in which these complications are irrelevant, namely that in which the knowledge domain is sampled in a completely random way (Wilcox, 1980). Few tests do this.…”

Section: When Is the Posey Model Valid?mentioning

confidence: 99%