1980
DOI: 10.1177/014662168000400402
|View full text |Cite
|
Sign up to set email alerts
|

Determining the Length of a Criterion-Referenced Test

Abstract: When determining how many items to include on a criterion-referenced test, practitioners must resolve various nonstatistical issues before a particular solution can be applied. A fundamental problem is deciding which of three true scores should be used. The first is based on the probability that an examinee is correct on a "typical" test item. The second is the probability of having acquired a typical skill among a domain of skills, and the third is based on latent trait models. Once a particular true score is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
15
0

Year Published

1981
1981
2014
2014

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(15 citation statements)
references
References 66 publications
0
15
0
Order By: Relevance
“…One appealing feature of this method of scoring a test, particularly for the case of multiple-choice test items, is that the resulting estimate of true score is corrected for guessing without assuming guessing is at random. In addition, results reported by Wilcox (1980) suggest that for some situations, scoring proficiency tests with latent structure models might sub- stantially reduce the number of items on the test that would otherwise be needed to obtain the same level of accuracy when comparing ~ tõ o If, however, latent structure models are used to score a test, existing, single administration estimates of reliability are no longer appropriate because, as will become evident, we must estimate a multinomial distribution rather than the simpler binomial probability function. The purpose of this paper is to describe and compare various solutions to this problem.…”
mentioning
confidence: 85%
See 2 more Smart Citations
“…One appealing feature of this method of scoring a test, particularly for the case of multiple-choice test items, is that the resulting estimate of true score is corrected for guessing without assuming guessing is at random. In addition, results reported by Wilcox (1980) suggest that for some situations, scoring proficiency tests with latent structure models might sub- stantially reduce the number of items on the test that would otherwise be needed to obtain the same level of accuracy when comparing ~ tõ o If, however, latent structure models are used to score a test, existing, single administration estimates of reliability are no longer appropriate because, as will become evident, we must estimate a multinomial distribution rather than the simpler binomial probability function. The purpose of this paper is to describe and compare various solutions to this problem.…”
mentioning
confidence: 85%
“…If f is greater than or equal to some predetermined constant, ro, the examinee is said to have passed the test; otherwise, he/she fails. For comments on how to determine ro, see Hambleton, et al 1978), Glass (1978), Wilcox (1980). One approach to characterizing such tests is to estimate the probability that for the typical examinee, ~ > ro when ro or that r < ~o when ~ < roo For the special case of one item per skill (i.e., k = 1), these probabilities can be estimated when the binomial error model (e.g., Lord and Novick, 1968, Chapter 23) or the Dirichlet-multi-…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Thus the course may cover many distinct topics and involve rote learning and complex argument, calculation and non-quantitative description, lectures and laboratory work, good teaching and under-emphasis. The Rasch model, and item response theory generally, are then inappropriate (Wilcox, 1980;Burton, 2004b). Indeed, typical academic tests of the kinds discussed here conform much more to the Posey model even though the facts and ideas to be tested are usually neither independent nor equivalent in difficulty and utility.…”
Section: Two Contrasting Types Of Testmentioning
confidence: 98%
“…Indeed, individual items may differ in inherent difficulty, and student knowledge may be patchy. However, there is one particular situation in which these complications are irrelevant, namely that in which the knowledge domain is sampled in a completely random way (Wilcox, 1980). Few tests do this.…”
Section: When Is the Posey Model Valid?mentioning
confidence: 99%