A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.
A differential item functioning (DIF) decomposition model separates a testlet item DIF into two sources: item-specific differential functioning and testlet-specific differential functioning. This article provides an alternative model-building framework and estimation approach for a DIF decomposition model that was proposed by Beretvas and Walker (2012). Although their model is formulated under multilevel modeling with the restricted pseudolikelihood estimation method, our approach illustrates DIF decomposition modeling that is directly built upon the random-weights linear logistic test model framework with the marginal maximum likelihood estimation method. In addition to demonstrating our approach's performance, we provide detailed information on how to implement this new DIF decomposition model using an item response theory software program; using DIF decomposition may be challenging for practitioners, yet practical information on how to implement it has previously been unavailable in the measurement literature.
Item response theory (IRT) is a class of latent variable models, which are used to develop educational and psychological tests (e.g., standardized tests, personality tests, tests for licensure, and certification). We review the theory and practices of IRT across two articles. In Part 1, we provide a broad range of topics such as foundations of educational measurement, basics of IRT, and applications of IRT using R. We focus particularly on the topics that the mirt package covers. These include unidimensional and multidimensional IRT models for dichotomous and polytomous items with continuous and discrete factors, confirmatory analysis and multigroup analysis in IRT, and estimation algorithms.In Part 2, on the other hand, we focus on more practical aspects of IRT, namely scoring, scaling, and equating.
This study investigated differential item functioning (DIF) mechanisms in the context of differential testlet effects across subgroups. Specifically, we investigated DIF manifestations when the stochastic ordering assumption on the nuisance dimension in a testlet does not hold. DIF hypotheses were formulated analytically using a parametric marginal item response function approach and compared with empirical DIF results from a unidimensional item response theory approach. The comparisons were made in terms of type of DIF (uniform or non-uniform) and direction (whether the focal or reference group was advantaged). In general, the DIF hypotheses were supported by the empirical results, showing the usefulness of the parametric approach in explaining DIF mechanisms. Both analytical predictions of DIF and the empirical results provide insights into conditions where a particular type of DIF becomes dominant in a specific DIF direction, which is useful for the study of DIF causes.
Item response theory (IRT) is a class of latent variable models, which are used to develop educational and psychological tests (e.g., standardized tests, personality tests, tests for licensure and certification). We offer readers with comprehensive overviews of the theory and applications of IRT through two articles. While Part 1 of the review discusses topics such as foundations of educational measurement, IRT models, item parameter estimation, and applications of IRT with R, this Part 2 reviews areas of test scores based on IRT. The primary focus is on presenting various topics with respect to test equating such as equating designs, IRT‐based equating methods, anchor stability check methods, and impact data analysis that psychometricians would deal with for a large‐scale standardized assessment in practice. These analyses are illustrated in Example section using data from Kolen and Brennan (2014). We also cover the foundation of IRT, IRT‐based person ability parameter estimation methods, and scaling and scale score.
This article is categorized under:
Applications of Computational Statistics > Psychometrics
Software for Computational Statistics > Software/Statistical Software
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.