Several person-fit statistics have been proposed to detect item score patterns that do not fit an item response theory model. To classify response patterns as misfitting, the distribution of a person-fit statistic is needed. The theoretical null distributions of several fit statistics have been derived for paper-and-pencil (P&P) tests. However, it is unknown whether these distributions also hold for computerized adaptive tests (CAT). A three-part simulation study was conducted. In the first study, the theoretical distribution of the l z statistic across trait. θlevels for CAT and P&P tests was investigated. The distribution of the l* zstatistic proposed by Snijders (in press) was also investigated. Results indicated that the distribution of both l zand l* zdiffered from the theoretical distribution in CAT. The second study examined the distributions of l zand l* zusing simulation. These simulated distributions, when based on O [UNKNOWN], were found to be problematic in CAT. In the third study, the detection rates of l* zand l zwere compared. The rates for both statistics were found to be similar in most cases.
Item scores that do not fit an assumed item response theory model may cause the latent trait value to be estimated inaccurately. Several person-fit statistics for detecting nonfitting score patterns for paper-and-pencil tests have been proposed. In the context of computerized adaptive tests (CAT), the use of person-fit analysis has hardly been explored. In this study, new person-fit statistics are proposed, and critical values-for these statistics are derived from existing statistical theory. Statistics are proposed that are sensitive to runs of correct or incorrect item scores and are based on all items administered in a CAT or based on subsets of items, using observed and expected item scores and using cumulative sum (CUSUM) procedures. The theoretical and empirical distributions of the statistics are compared and detection rates are investigated. Results show that the nominal and empirical Type I error rates are comparable for CUSUM procedures when the number of items in each subset and the number of measurement points are not too small.
Item scores that do not fit an assumed item response theory model may cause the latent trait value to be estimated inaccurately. For computerized adaptive tests (CAT) with dichotomous items, several person-fit statistics for detecting nonfitting item score patterns have been proposed. Both for paper-and-pencil (P&P) test and CATs, detection of person misfit with polytomous items has hardly been explored. In this simulation study, the theoretical and empirical null distributions of a person-fit statistic for polytomous items are compared for P&P tests and CATs. Results show that the empirical distribution of this statistic was close to the standard normal distribution, for both P&P tests and CATs. Also statistics that are especially designed for a CAT are proposed. In these statistics observed and expected item scales are compared using cumulative sum (CUSUM) procedures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.