IRT Equating Methods

Cook, Linda L.; Eignor, Daniel R.

doi:10.1111/j.1745-3992.1991.tb00207.x

Cited by 83 publications

(49 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…As many testing programs use IRT to assemble tests, the use of IRT equating is a natural option (Skaggs and Lissitz 1986;Cook and Eignor 1991;Lord 1980, Chapter 13). Using IRT in the equating process requires a previous step, referred to here as IRT item-parameter linking.…”

Section: Irt Parameter Linking and Equating Methodsmentioning

confidence: 99%

SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating

González¹

2014

J. Stat. Soft.

View full text Add to dashboard Cite

Equating is a family of statistical models and methods that are used to adjust scores on two or more versions of a test, so that the scores from different tests may be used interchangeably. In this paper we present the R package SNSequate which implements both standard and nonstandard statistical models and methods for test equating. The package construction was motivated by the need of having a modular, simple, yet comprehensive, and general software that carries out traditional and new equating methods. SNSequate currently implements the traditional mean, linear and equipercentile equating methods, as well as the mean-mean, mean-sigma, Haebara and Stocking-Lord item response theory linking methods. It also supports the newest methods such as local equating, kernel equating, and item response theory parameter linking methods based on asymmetric item characteristic functions. Practical examples are given to illustrate the capabilities of the software. A list of other programs for equating is presented, highlighting the main differences between them. Future directions for the package are also discussed.

show abstract

Section: Irt Parameter Linking and Equating Methodsmentioning

confidence: 99%

SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating

González¹

2014

J. Stat. Soft.

View full text Add to dashboard Cite

show abstract

“…Indeed, IRT models are currently being utilized by large test publishers (Kingston & Stocking, 1986) as well as departments of education (pandey & Carlson, 1983) for a variety of purposes such as norm-and criterion-referenced test development, test equating (Cook & Eignor, 1991) and the detection of differentially functioning items (Thissen, Steinberg, & Wainer, 1993). Warm (1978) summarizes the importance of IRT as follows:…”

Section: Introductionmentioning

confidence: 99%

A Comparison of the Properties of Irt Parameter Estimates Using Two Different Calibration Designs

Wightman

Champlain

1994

ETS Research Report Series

View full text Add to dashboard Cite

The purpose of this study was to compare two different methods of obtaining 3PL IRT pretest item parameter estimates for the Graduate Management Admissions Testing Program. The first method consisted of calibrating pretest and operational items simultaneously in a LOGIST run, that is, a concurrent calibration design. The second approach entailed analyzing the pretest items separately from the operational items holding examinee ability scores constant from a previous operational items run, that is, using a two-stage calibration design. Results show that the means of the item difficulty (b-parameter) estimates were very similar, regardless of the method employed. However, the higher b-parameter values using the two-stage calibration run method (i.e. holding ability fixed excluding the studied items from the criterion) were slightly overestimated and the lower b-parameter values were slightly underestimated. The a-parameters were consistently underestimated using the two-stage estimation procedure. Finally, the slopes of the item-ability regressions using a concurrent calibration (including the studied items) are steeper for nearly all of the items. These preliminary results are consistent with those reported in past studies (Stocking & Eignor, 1986) and suggest that non-operational (pretest) items should be calibrated concurrently with operational items for item banking purposes.

show abstract

“…The main purposes include test development, test equating, determining item bias, and scaling. Contrary to CTT, IRT mathematically models the relationship between an individual's ability and his/her opportunity to provide the correct answer to an item (Cook & Eignor, 1991). One of the most important properties of IRT is that ability and item difficulty are in the same scale, which ensures the invariance of item and ability parameters.…”

Section: Item Response Theorymentioning

confidence: 99%

“…Although the security of questions is protected by this application, problems of equality and fairness of tests emerge. More specifically, although the tests are similar in terms of content, it is possible that some individuals can take a simpler or more reliable test and become advantageous compared to others (Cook & Eignor, 1991). For example, the Foreign Language Exam (YDS) and the Academic Personnel and Postgraduate Education Entrance Exam (ALES) are held twice a year, and their scores are used for entrance to educational institutions.…”

mentioning

confidence: 99%

Effect of Differential Item Functioning on Test Equating

2015

EDUC SCI-THEOR PRACT

View full text Add to dashboard Cite

This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test length, DIF magnitude, and the test type. The MIRMs, in which the DIF factors were added as parameters, were compared with the Stocking-Lord (SL) method (one of the IRM-based calibration methods) and concurrent calibration method. According to the results, differences were found in the performances of the methods under the analyzed conditions. More specifically, the MIRMs were able to identify the DIF items, carry out the equating processes, and eliminate the biases caused by DIF in only one analysis. However, this does not indicate that using MIRMs is the best approach since the increase in sample size and test length generally had a positive effect on IRM-based equating, whereas MIRMs were less affected by these two conditions. Considering the IRM-based methods, it was found that separate calibration methods were more affected by the presence of DIF items compared to concurrent calibration. Moreover, this effect becomes most significant when DIF items are in common test and the magnitude of DIF is C.

show abstract

IRT Equating Methods

Cited by 83 publications

References 8 publications

SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating

SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating

A Comparison of the Properties of Irt Parameter Estimates Using Two Different Calibration Designs

Effect of Differential Item Functioning on Test Equating

Contact Info

Product

Resources

About