“…To have the examinee θ distribution in the full dataset reflected in the drawn samples, the examinees' θ levels were converted into categorical data by assigning a category number to θs at interval of 0.25 (e.g., θ = 3.00…2.75 = 1; θ = 2.749…2.50 = 2); in this manner, 24 discrete θ levels were obtained. Then, using the θ levels as strata in SPSS 20's (IBM Corp., 2011) complex samples module, samples of 150 (Harwell & Janosky, 1991), 250 (Goldman & Raju, 1986;Harwell & Janosky, 1991), 500 (Akour & Al-Omari, 2013;Baker, 1998;Gao & Chen, 2005;Goldman & Raju, 1986;Hulin et al, 1982;Thissen & Wainer, 1982), 1,000 (Goldman & Raju, 1986;Hulin et al, 1982;Lord, 1968;Thissen & Wainer, 1982;Weiss & von Minden, 2012;Yen, 1987), 2,000 (Gao & Chen, 2005;Hulin et al, 1982;Ree & Jensen, 1980;Yoes, 1995), 3,000 (Tang et al, 1993), and 5,000 (Akour & Al-Omari, 2013) that had been tested in previous research (including those conducted in one-and two-parameter logistic models) on IRT-based calibration sample size as well as two uncommon sample sizes (350 and 750) were drawn. These samples were drawn from each of the datasets with 100, 200, 300, and 500 items and 10,000 examinee responses.…”