The Impact of Item Deletion on Equating Conversions and Reported Score Distributions

Dorans, Neil J.

doi:10.1111/j.1745-3984.1986.tb00250.x

Cited by 8 publications

(6 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lord and Wild (1985) compared the contribution of the four verbal item types to measurement accuracy of the GRE General Test, finding that the reading comprehension item type measures something slightly different from what is measured by sentence completion, analogy, or antonym item types. Dorans (1986) used IRT to study the effects of item deletion on equating functions and the score distribution on the SAT, concluding that reequating should be done when an item is dropped. Kingston and Holland (1986) compared equating errors using IRT and several other equating methods, and several equating designs, for equating the GRE General Test, with varying results depending on the specific design and method.…”

Section: Lord's Book Applications Of Item Response Theory To Practicmentioning

confidence: 99%

Item Response Theory

Carlson

Davier

2017

Methodology of Educational Measurement and Assessment

View full text Add to dashboard Cite

Item response theory (IRT) models, in their many forms, are undoubtedly the most widely used models in large-scale operational assessment programs. They have grown from negligible usage prior to the 1980s to almost universal usage in largescale assessment programs, not only in the United States, but in many other countries with active and up-to-date programs of research in the area of psychometrics and educational measurement.Perhaps the most important feature leading to the dominance of IRT in operational programs is the characteristic of estimating individual item locations (difficulties) and test-taker locations (abilities) separately, but on the same scale, a feature not possible with classical measurement models. This estimation allows for tailoring tests through judicious item selection to achieve precise measurement for individual test takers (e.g., in computerized adaptive testing, CAT) or for defining important cut points on an assessment scale. It also provides mechanisms for placing different test forms on the same scale (linking and equating). Another important characteristic of IRT models is local independence: for a given location of test takers on the scale, the probability of success on any item is independent of that of every other item on that scale. This characteristic is the basis of the likelihood function used to estimate test takers' locations on the scale.Few would doubt that ETS researchers have contributed more to the general topic of IRT than individuals from any other institution. In this chapter we briefly review most of those contributions, dividing them into sections by decades of publication. Of course, many individuals in the field have changed positions between

show abstract

Section: Lord's Book Applications Of Item Response Theory To Practicmentioning

confidence: 99%

Item Response Theory

Carlson

Davier

2017

Methodology of Educational Measurement and Assessment

View full text Add to dashboard Cite

show abstract

“…Lord and Cheryl Wild (1985) compared the contribution of the four verbal item types to measurement accuracy of the GRE General Test, finding that the reading comprehension item type measures something slightly different from what is measured by sentence completion, analogy, or antonym item types. Dorans (1986) used IRT to study the effects of item deletion on equating functions and the score distribution on the SAT, concluding that reequating should be done when an item is dropped. Kingston and Holland (1986) compared equating errors using IRT and several other equating methods, and several equating designs, for equating the GRE General Test, with varying results depending on the specific design and method.…”

Section: Lord's Book Applications Of Item Response Theory To Practicmentioning

confidence: 99%

Item Response Theory

Carlson

Davier

2013

ETS Research Report Series

View full text Add to dashboard Cite

Few would doubt that ETS researchers have contributed more to the general topic of item response theory (IRT) than individuals from any other institution. In this report, we briefly review most of those contributions, dividing them into sections by decades of publication, beginning with early work by Fred Lord and Bert Green in the 1950s and ending with recent work that produced models involving complex structures and multiple dimensions.

show abstract

“…The comparison tells us what the relationship is between the score scale of the new test and the score scale used previously. We use this information to identify the corresponding scores on the old and new tests (Dorans, 1986;Haertel, 2004 Table 2 shows that the tests were long and assessed a variety of subject areas and grades. They had been rigorously developed and subjected to extensive internal and external reviews.…”

Section: Definition Of Some Termsmentioning

confidence: 99%

“…The comparison tells us what the relationship is between the score scale of the new test and the score scale used previously. We use this information to identify the corresponding scores on the old and new tests (Dorans, 1986; Haertel, 2004). Once this is done, we can compare old and new test scores, apply previously established cut scores to the new test, and the like.…”

Section: Definition Of Some Termsmentioning

confidence: 99%

NCME 2008 Presidential Address: The Impact of Anchor Test Configuration on Student Proficiency Rates

Fitzpatrick

2008

Educational Measurement

View full text Add to dashboard Cite

Examined in this study were the effects of reducing anchor test length on student proficiency rates for 12 multiple‐choice tests administered in an annual, large‐scale, high‐stakes assessment. The anchor tests contained 15 items, 10 items, or five items. Five content representative samples of items were drawn at each anchor test length from a small universe of items in order to investigate the stability of equating results over anchor test samples. The operational tests were calibrated using the one‐parameter model and equated using the mean b‐value method. The findings indicated that student proficiency rates could display important variability over anchor test samples when 15 anchor items were used. Notable increases in this variability were found for some tests when shorter anchor tests were used. For these tests, some of the anchor items had parameters that changed somewhat in relative difficulty from one year to the next. It is recommended that anchor sets with more than 15 items be used to mitigate the instability in equating results due to anchor item sampling. Also, the optimal allocation method of stratified sampling should be evaluated as one means of improving the stability and precision of equating results.

show abstract

The Impact of Item Deletion on Equating Conversions and Reported Score Distributions

Cited by 8 publications

References 3 publications

Item Response Theory

Item Response Theory

Item Response Theory

NCME 2008 Presidential Address: The Impact of Anchor Test Configuration on Student Proficiency Rates

Contact Info

Product

Resources

About