1993
DOI: 10.1002/j.2333-8504.1993.tb01570.x
|View full text |Cite
|
Sign up to set email alerts
|

The Effect of Small Calibration Sample Sizes on Toefl Irt‐based Equating

Abstract: The present study compared the performance of LOGIST and BILOG on TOEFL IRT‐based scaling and equating using both real and simulated data and two calibration structures. Applications of IRT for the TOEFL program are based on the three‐parameter logistic (3PL) model. The results of the study show that item parameter estimates obtained from the smaller real data sample sizes were more consistent with the larger sample estimates when based on BILOG than when based on LOGIST. In addition, the root mean squared err… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0
2

Year Published

2015
2015
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 9 publications
0
10
0
2
Order By: Relevance
“…This proves the use of a number of samples that are slightly unsuitable using the 3-PL estimation model. Tang, Way, and Carey (1993) suggested the use of a large sample size when estimating using a 3-PL model that is at least 1000 people so that the resulting parameter estimation are more accurate and stable. Conversely, small size sample usage is still able to provide stability in parameter estimation when using the 1-PL model.…”
Section: Ability Parameter Estimationmentioning
confidence: 99%
“…This proves the use of a number of samples that are slightly unsuitable using the 3-PL estimation model. Tang, Way, and Carey (1993) suggested the use of a large sample size when estimating using a 3-PL model that is at least 1000 people so that the resulting parameter estimation are more accurate and stable. Conversely, small size sample usage is still able to provide stability in parameter estimation when using the 1-PL model.…”
Section: Ability Parameter Estimationmentioning
confidence: 99%
“…Boldt (1993) compared linking based on the 3PL IRT model and a modified Rasch model (common nonzero lower asymptote) and concluded that the 3PL model should not be used if sample sizes are small. Tang et al (1993) compared the performance of the computer programs LOGIST and BILOG (see Carlson and von Davier, Chap. 5, this volume, for more on these programs) on TOEFL 3PL IRT-based linking.…”
Section: Item Response Theory True-score Linkingmentioning
confidence: 99%
“…Hulin, Lissak, and Drasgow (1982) also concluded that a sample of 1,000 was necessary with 60 items to accurately estimate item parameters in the 3PLM. Although Ree and Jensen (1980) stated that accurate item parameter estimates require only 500 examinees in the 3PLM, with empirical support from studies by Patsula and Gessaroli (1995); Tang, Way, and Carey (1993); Yen (1987); and Yoes (1995), Lord's (1968) suggestion to use 1,000 examinees as the minimum item calibration sample size was accepted by many IRT researchers. However, some studies that supported Ree and Jensen's finding that sample sizes less than 1,000 can be used without losing much estimation accuracy were also published.…”
Section: Sample Size Requirements In Item Response Theory-based Item mentioning
confidence: 99%
“…To have the examinee θ distribution in the full dataset reflected in the drawn samples, the examinees' θ levels were converted into categorical data by assigning a category number to θs at interval of 0.25 (e.g., θ = 3.00…2.75 = 1; θ = 2.749…2.50 = 2); in this manner, 24 discrete θ levels were obtained. Then, using the θ levels as strata in SPSS 20's (IBM Corp., 2011) complex samples module, samples of 150 (Harwell & Janosky, 1991), 250 (Goldman & Raju, 1986;Harwell & Janosky, 1991), 500 (Akour & Al-Omari, 2013;Baker, 1998;Gao & Chen, 2005;Goldman & Raju, 1986;Hulin et al, 1982;Thissen & Wainer, 1982), 1,000 (Goldman & Raju, 1986;Hulin et al, 1982;Lord, 1968;Thissen & Wainer, 1982;Weiss & von Minden, 2012;Yen, 1987), 2,000 (Gao & Chen, 2005;Hulin et al, 1982;Ree & Jensen, 1980;Yoes, 1995), 3,000 (Tang et al, 1993), and 5,000 (Akour & Al-Omari, 2013) that had been tested in previous research (including those conducted in one-and two-parameter logistic models) on IRT-based calibration sample size as well as two uncommon sample sizes (350 and 750) were drawn. These samples were drawn from each of the datasets with 100, 200, 300, and 500 items and 10,000 examinee responses.…”
Section: Drawing Calibration Samplesmentioning
confidence: 99%