Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical methodology for detecting and understanding DTF has not been developed. This article proposes improved DTF statistics that properly account for sampling variability in item parameter estimates while avoiding the necessity of predicting provisional latent trait estimates to create two-step approximations. The properties of the DTF statistics were examined with two Monte Carlo simulation studies using dichotomous and polytomous IRT models. The simulation results revealed that the improved DTF statistics obtained optimal and consistent statistical properties, such as obtaining consistent Type I error rates. Next, an empirical analysis demonstrated the application of the proposed methodology. Applied settings where the DTF statistics can be beneficial are suggested and future DTF research areas are proposed.
Equivalence tests are an alternative to traditional difference-based tests for demonstrating a lack of association between two variables. While there are several recent studies investigating equivalence tests for comparing means, little research has been conducted on equivalence methods for evaluating the equivalence or similarity of two correlation coefficients or two regression coefficients. The current project proposes novel tests for evaluating the equivalence of two regression or correlation coefficients derived from the two one-sided tests (TOST) method (Schuirmann, 1987, J. Pharmacokinet. Biopharm, 15, 657) and an equivalence test by Anderson and Hauck (1983, Stat. Commun., 12, 2663). A simulation study was used to evaluate the performance of these tests and compare them with the common, yet inappropriate, method of assessing equivalence using non-rejection of the null hypothesis in difference-based tests. Results demonstrate that equivalence tests have more accurate probabilities of declaring equivalence than difference-based tests. However, equivalence tests require large sample sizes to ensure adequate power. We recommend the Anderson-Hauck equivalence test over the TOST method for comparing correlation or regression coefficients.
Measurement Invariance (MI) is often concluded from a nonsignificant chi-square difference test. Researchers have also proposed using change in goodness of fit indices (∆GOFs) instead. Both of these commonly used methods for testing MI have important limitations. To combat these issues, Yuan and Chan (2016) proposed using an equivalence test (EQ) to replace the chi-square difference test commonly used to test MI. Due to their concerns with the EQ's power, Yuan and Chan also created an adjusted version (EQ-A), but provide little evaluation of either procedure. The current study evaluated the Type I error and power of both the EQ and EQ-A, and compared their performance to that of the traditional chi-square difference test and ∆GOFs. The EQ for nested model comparisons was the only procedure that always maintained empirical error rates below the nominal alpha level. Results also highlight that the EQ requires larger sample sizes than traditional difference-based approaches or using equivalence bounds based on larger than conventional RMSEA values (e.g., > .05) to ensure adequate power rates. We do not recommend Yuan and Chan's proposed adjustment (EQ-A) over the EQ.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.