This study examines model selection indices for use with dichotomous mixture item response theory (IRT) models. Five indices are considered: Akaike's information coefficient (AIC), Bayesian information coefficient (BIC), deviance information coefficient (DIC), pseudoBayes factor (PsBF), and posterior predictive model checks (PPMC). The five indices provide somewhat different recommendations for a set of real data. Results from a simulation study indicate that BIC selects the correct (i.e., the generating) model well under most conditions simulated and for all three of the dichotomous mixture IRT models considered. PsBF is almost as effective. AIC and PPMC tend to select the more complex model under some conditions. DIC is least effective for this use.
Methods for detecting differential item func tioning (DIF) have been proposed primarily for the item response theory dichotomous response model. Three measures of DIF for the dichotomous response model are extended to include Samejima's graded response model: two measures based on area differences between item true score functions, and a χ2 statistic for comparing differences in item parameters. An illustrative example is presented.
Applications of item response theory (IRT) to practical testing problems, including equating, differential item functioning, and computerized adaptive testing, require a common metric for item parameter estimates. This study compared three methods for developing a common metric under IRT: (1) linking separate calibration runs using equating coefficients from the characteristic curve method, (2) concurrent calibration based on marginal maximum a posteriori estimation, and (3) concurrent calibration based on marginal maximum likelihood estimation. For smaller numbers of common items, linking using the characteristic curve method yielded smaller root mean square differences for both item discrimination and difficulty parameters. For larger numbers of common items, the three methods yielded similar results.
Type I error rates for the likelihood ratio test for detecting differential item functioning (DIF) were investigated using monte carlo simulations. Two-and three-parameter item response theory (IRT) models were used to generate 100 datasets of a 50-item test for samples of 250 and 1,000 simulated examinees for each IRT model. Item parameters were estimated by marginal maximum likelihood for three IRT models: the three-parameter model, the three-parameter model with a fixed guessing parameter, and the two-parameter model. All DIF comparisons were simulated by randomly pairing two samples from each sample size and IRT model condition so that, for each sample size and IRT model condition, there were 50 pairs of reference and focal groups. Type I error rates for the two-parameter model were within theoretically expected values at each of the α levels considered. Type I error rates for the three-parameter and three-parameter model with a fixed guessing parameter, however, were different from the theoretically expected values at the α levels considered. Index terms: bias, differential item functioning, item bias, item response theory, likelihood ratio test for DIF. An item is said to be functioning differentially when the probability of a correct response is different for examinees at the same trait level but from different groups (Pine, 1977). Because the presence of such items on a test is a threat to validity and may seriously interfere with efforts to equate tests, they must be removed from consideration. Thissen,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.