Since Cox and Vargas (1966) introduced their pretest-posttest validity index for criterion-referenced test items, a great number of additions and modifications havefollowed. All are based on the idea of gain scoring; that is, they are computedfrom the differences between proportions ofpretest andposttest item responses. Although the method is simple and generally considered as the prototype of criterion-referenced item analysis, it has many and serious disadvantages. Some of these go back to thefact that it leads to indices based on a dual test administration-and population-dependent item p values. Others have to do with the global information about the discriminating power that these indices provide, the implicit weighting they suppose, and the meaningless maximization of posttest scores they lead to. Analyzing the pretest-posttest methodfrom a latent trait point of view, it is proposed to replace indices like Cox and Vargas' Dpp by an evaluation of the item informationfunctionfor the mastery score. An empirical study was conducted to compare the differences in item selection between both methods.As in any other area of educational and psychological measurement, more attention has been paid to reliability than to validity aspects of criterion-referenced measurement. Several test parameters have been proposed and compared with their normreferenced counterparts, assessment methods have been introduced and examined using both real and simulated data, and the criterion-referenced reliability problem seems on its way to a great diversity of solutions (Hambleton & Novick, 1973; Huynh, 1976a Huynh, , 1976bLivingston, 1972;Marshall, 1975 That less powerful efforts have been made to tackle the validity problem may in part be due to a standpoint advocated by, for example, Millman (1974). According to this standpoint, criterion-referenced validity is the same as content validity and to establish this the construction of a well-defined domain of items is sufficient. Once an item is included in the domain no empirical information or item analysis can or Thanks are due to Fred N. Kerlinger, Gideon J. Mellenbergh, Robert F. van Naerssen, and Egbert Warries for their helpful comments; to Hans van Aalst, Fred Boesenkool, Kees Hellingman, Ton Heuvelman, Rien Steen, Niels Veldhuizen, Ronny Wierstra, and Theo Wubbels for participating in the empirical study and computational assistance; and to Paula Achterberg for typing the manuscript.
379WIM J. VAN These authors distinguish three approaches to the validity problem. The first is the aforementioned item form or item generation rule approach. In it, the fixed syntactical structure and variable elements of item sentences are used to define domains and-eventually with the aid of the computer-to sample items. Item validity is automatically guaranteed because the definition of the domain and the construction of the items are accomplished y the same set of rules. The second approach is a judgmental procedure in which content specialists are retained. The judgmental task may assume ...