The multiple indicators, multiple causes (MIMIC) method with a pure short anchor was proposed to detect differential item functioning (DIF). A simulation study showed that the MIMIC method with an anchor of 1, 2, 4, or 10 DIF-free items yielded a well-controlled Type I error rate even when such tests contained as many as 40% DIF items. In general, a longer anchor increased the power of DIF detection, and a 4-item anchor was long enough to yield a high power of DIF detection. An iterative MIMIC procedure was proposed to locate a set of DIF-free items to function as a pure anchor so that the MIMIC method could proceed properly. In another simulation study, it was found that this iterative procedure yielded a perfect (or nearly perfect) rate of accuracy in locating a set of up to 4 DIF-free items.
The DIF-free-then-DIF (DFTD) strategy consists of two steps: (a) select a set of items that are the most likely to be DIF-free and (b) assess the other items for DIF (differential item functioning) using the designated items as anchors. The rank-based method together with the computer software IRTLRDIF can select a set of DIF-free polytomous items very accurately, but it loses accuracy when tests contain many DIF items. To resolve this problem, the authors developed a new method by adding a scale purification procedure to the rank-based method and conducted two simulation studies to evaluate its performances on DIF assessment. It was found that the new method outperformed the rank-based method in identifying DIF-free items, especially when the tests contained many DIF items. In addition, the new method, combined with the DFTD strategy, yielded a well-controlled Type I error rate and a high power rate of DIF detection. In contrast, conventional DIF assessment methods yielded an inflated Type I error rate and a deflated power rate when the tests contained many DIF items favoring the same group. In conclusion, the simulation results support the new method and the DFTD strategy in DIF assessment.
This study presents the random‐effects rating scale model (RE‐RSM) which takes into account randomness in the thresholds over persons by treating them as random‐effects and adding a random variable for each threshold in the rating scale model (RSM) (Andrich, 1978). The RE‐RSM turns out to be a special case of the multidimensional random coefficients multinomial logit model (MRCMLM) (Adams, Wilson, & Wang, 1997) so that the estimation procedures for the MRCMLM can be directly applied. The results of the simulation indicated that when the data were generated from the RSM, using the RSM and the RE‐RSM to fit the data made little difference: both resulting in accurate parameter recovery. When the data were generated from the RE‐RSM, using the RE‐RSM to fit the data resulted in unbiased estimates, whereas using the RSM resulted in biased estimates, large fit statistics for the thresholds, and inflated test reliability. An empirical example of 10 items with four‐point rating scales was illustrated in which four models were compared: the RSM, the RE‐RSM, the partial credit model (Masters, 1982), and the constrained random‐effects partial credit model. In this real data set, the need for a random‐effects formulation becomes clear.
This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling false-positive rates and yielding higher true-positive rates. Only when the DIF pattern is balanced between groups or when there is a small percentage of DIF items in the test does M-ST perform as appropriately as M-SP. Moreover, both methods yield a higher true-positive rate under the two-parameter logistic model than under the three-parameter model. M-SP is preferable to M-ST, because DIF patterns in real tests are unlikely to be perfectly balanced and the percentages of DIF items may not be small.
Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods yielded a wellcontrolled Type I error rate when tests did not contain any DIF items. M-ST and M-SP began to yield an inflated Type I error rate and a deflated power when tests contained 10% and 20% DIF items, respectively. M-PA maintained an expected Type I error rate and a high power even when tests contained as many as 40% DIF items. An iterative MIMIC procedure was proposed to select a small set of DIF-free items to serve as the anchor in M-PA. It was found in a series of simulations that this procedure yielded a very high rate of accuracy. Two simulated data sets were then analyzed to show applications of these MIMIC methods for DIF assessment in polytomous items.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.