Since Cox and Vargas (1966) introduced their pretest-posttest validity index for criterion-referenced test items, a great number of additions and modifications havefollowed. All are based on the idea of gain scoring; that is, they are computedfrom the differences between proportions ofpretest andposttest item responses. Although the method is simple and generally considered as the prototype of criterion-referenced item analysis, it has many and serious disadvantages. Some of these go back to thefact that it leads to indices based on a dual test administration-and population-dependent item p values. Others have to do with the global information about the discriminating power that these indices provide, the implicit weighting they suppose, and the meaningless maximization of posttest scores they lead to. Analyzing the pretest-posttest methodfrom a latent trait point of view, it is proposed to replace indices like Cox and Vargas' Dpp by an evaluation of the item informationfunctionfor the mastery score. An empirical study was conducted to compare the differences in item selection between both methods. As in any other area of educational and psychological measurement, more attention has been paid to reliability than to validity aspects of criterion-referenced measurement. Several test parameters have been proposed and compared with their normreferenced counterparts, assessment methods have been introduced and examined using both real and simulated data, and the criterion-referenced reliability problem seems on its way to a great diversity of solutions (