Position effects may influence examinees' test performances in several ways and trigger other psychometric issues, such as Differential Item Functioning (DIF) .This study aims to supply test forms in which items in the test are ordered differently, depending on their difficulty level (from easy to difficult or difficult to easy), to determine whether the items in the test form result in DIF and whether a consistency exists between the methods for detecting DIF. Research Methods: Methods of Mantel Haenszel (MH) and Logistic Regression (LR) have been taken into consideration to identify whether the items in the tests involve DIF.
Computerized Adaptive Tests (CAT) are gaining much more attention than ever by the institutions especially the ones attracting students worldwide due to the nature of CAT not allowing the same items to be presented to different individuals taking the test. In this study, it was aimed to investigate of measurement precision and test length in computerized adaptive testing (CAT) under different conditions. The research was implemented as a Monte Carlo simulation study. In line with the purpose of the study, 500 items which response probabilities were modeled with the three parameter logistic (3PL) model were generated. Fixed length (15,20), standard error (SE<.30, SE<.50) termination rules have been used for the study. Additionally, in comparing termination rules, different starting rules (θ=0,-1<θ<1), ability estimation methods (Maksimum Likelihood Estimation (MLE) ,Expected a Posteriori (EAP) and Maximum a Posteriori Probability (MAP)), item selection method (Kullback Leibler Information (KLI) and Maximum Fischer Information (MFI)) have been selected since these are critical in the algorithms of CAT. 25 replications was performed for each condition in the generated data. The results obtained from study were evaluated by using RMSE, bias and fidelity values criterions. R software was used for data generation and analyses. As a result of the study, it was seen that choosing the test starting rule as θ=0 or -1<θ<1 did not cause a significant difference in terms of measurement precision and test length. It was concluded that the termination rule, in which RMSE and bias values were lower than the other conditions, was the 0.30 SE termination rule. When the EAP ability estimation method was used, lower RMSE and bias values were obtained compared to the MLE. It was concluded that the KLI item selection method had lower RMSE and bias values compared to the MFI.
In this research, it was aimed to compare equating errors of scale transformation methods (meanmean (MM), mean-sigma (MS), Heabera (HB) and Stocking-Lord (SL)) in true score equating based on item response theory (IRT) under different conditions. In line with the purpose of the study, 7200 dichotomous data sets which were consistent with two and three-parameter logistic model were generated with 50 replication under the conditions of sample size (500, 1,000, 3,000, 10,000), test length (40, 50, 80), rate of the common item (20%, 30%, 40%), type of model used in parameter estimation (two and three-parameter logistic models (2PLM and 3PLM)), and ability distribution of groups (similar (N(0-1) -N(0-1)), different (N(0-1) -N(0.5,1)) for the obtained performance of methods. Common item nonequivalent groups equating design was used. R software was used for data generation and analyses. The results obtained from the study were evaluated by using equating error (RMSD) criterion. As a result of the study, considering all the conditions, it was seen that the RMSD values of the SL method were higher than the other methods, but it was seen that the MM and MS methods produced similar RMSD values. In addition, when the RMSD values of the scale transformation methods are compared, similar results are obtained in cases where 2PLM and 3PLM are used, as the sample size and test length increase, equating errors of other methods except the SL method decrease, and It was observed that the methods had lower RMSD values in cases where the common item rate is 40% and the ability distribution of the groups is similar.
Computerized Adaptive Tests (CAT) are gaining much more attention than ever by the institutions especially the ones attracting students worldwide due to the nature of CAT not allowing the same items to be presented to different individuals taking the test. The aim of this study is to measure the effect of different termination rules on measurement precision and test length in computer adaptive testing. The research was implemented as a Monte Carlo simulation study. The data generation of the computerized adaptive test was carried out using "catR" package. In comparing termination rules, starting rules (b=0 and -1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.