1998
DOI: 10.1177/01466216980222003
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison of Linking and Concurrent Calibration Under Item Response Theory

Abstract: Applications of item response theory (IRT) to practical testing problems, including equating, differential item functioning, and computerized adaptive testing, require a common metric for item parameter estimates. This study compared three methods for developing a common metric under IRT: (1) linking separate calibration runs using equating coefficients from the characteristic curve method, (2) concurrent calibration based on marginal maximum a posteriori estimation, and (3) concurrent calibration based on mar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
92
1
3

Year Published

2003
2003
2018
2018

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 99 publications
(98 citation statements)
references
References 16 publications
2
92
1
3
Order By: Relevance
“…This finding is consistent with the results of Gök (2012), which showed that sample size had no positive effect on methods, while test length had a positive effect. In addition, studies have shown that equating performance is in accordance with equating steps and therefore, a one-step process is better than a two-step process (Chu, 2002;Hanson & Beguin, 1999b;Kim & Cohen, 1998). Consistent with other studies, the present study found the highest bias value in the IRM-SC.…”
Section: Discussionsupporting
confidence: 90%
“…This finding is consistent with the results of Gök (2012), which showed that sample size had no positive effect on methods, while test length had a positive effect. In addition, studies have shown that equating performance is in accordance with equating steps and therefore, a one-step process is better than a two-step process (Chu, 2002;Hanson & Beguin, 1999b;Kim & Cohen, 1998). Consistent with other studies, the present study found the highest bias value in the IRM-SC.…”
Section: Discussionsupporting
confidence: 90%
“…Two studies indicated that concurrent calibration as is implemented in BILOG-MG performed equally well in comparison with traditional methods in vertical equating (Béguin & Hanson, 2001;Kim & Cohen, 1998). Because in our study the number of anchor items was large and because these anchor items did form a representative sample of the test, technical issues of vertical equating should not be a problem here.…”
Section: Discussionmentioning
confidence: 72%
“…A formal exposure can be found in Bock and Zimowski (1995) and Mislevy (1987). Kim and Cohen (1998) compared concurrent calibration methods with the approach based on separate calibrations using equating coefficients from the 1042 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT characteristic curve method. They used a simulated data set consisting of two groups differing in ability level.…”
Section: Developmental Score Scales and Vertical Equatingmentioning
confidence: 99%
“…Using the marginal maximum likelihood estimation method, Kim and Cohen (1998) examined the separate calibration method using BILOG (Mislevy & Bock, 1982) with the Stocking and Lord method (Stocking & Lord, 1983) and the concurrent calibration method using MULTILOG (Thissen, 1991). They concluded that the two methods provided similar results except when the number of common items was small (e.g., 5 out of 50), where separate calibration provided more accurate results.…”
Section: Concurrent Vs Separate Calibrations In Uirtmentioning
confidence: 99%
“…However, Hanson and Beguin (2002) pointed out that the differences between concurrent and separate calibration results in the case of non-equivalent groups in the Kim and Cohen (1998) study were confounded with the different computer programs:…”
Section: Concurrent Vs Separate Calibrations In Uirtmentioning
confidence: 99%