Deriving Comparable Scores for Computer Adaptive and Conventional Tests: An Example Using the Sat1,2

Eignor, Daniel R.

doi:10.1002/j.2333-8504.1993.tb01566.x

Cited by 8 publications

(6 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For example, for Ito and Sykes's (2004) paper, the original sample design was for test level and each level consisted of two grades (e.g., Level 2 = Grades 4 and 5, etc.). For both Davis's (2003) and Eignor's (1993) papers, both tests were high school Table 1 The exit examinations. Because no information was provided on the breakdown of grade, we reported "high school" as the unit of sampling.…”

Section: Moderators Coded For Each Studymentioning

confidence: 99%

A Meta-Analysis of Testing Mode Effects in Grade K-12 Mathematics Tests

Wang¹,

Jiao²,

Young³

et al. 2007

Educational and Psychological Measurement

View full text Add to dashboard Cite

This study conducted a meta-analysis of computer-based and paper-and-pencil administration mode effects on K-12 student mathematics tests. Both initial and final results based on fixed- and random-effects models are presented. The results based on the final selected studies with homogeneous effect sizes show that the administration mode had no statistically significant effect on K-12 student mathematics tests. Only the moderator variable of computer delivery algorithm contributed to predicting the effect size. The differences in scores between test modes were larger for linear tests than for adaptive tests. However, such variables as study design, grade level, sample size, type of test, computer delivery method, and computer practice did not lead to differences in student mathematics scores between computer-based and paper-and-pencil modes.

show abstract

Section: Moderators Coded For Each Studymentioning

confidence: 99%

A Meta-Analysis of Testing Mode Effects in Grade K-12 Mathematics Tests

Wang¹,

Jiao²,

Young³

et al. 2007

Educational and Psychological Measurement

View full text Add to dashboard Cite

show abstract

“…A general approach to achieving comparability is through the design of the CAT tests. It is typically done through a series of simulation studies at the early stages and some real examinee studies at later stages (e.g., Eignor, 1993; Eignor & Schaeffer, 1995;Eignor, Stocking, Way, & Steffen, 1993;Schaeffer, Reese, Steffen, McKinley, & Mills, 1993;Schaeffer, Steffen, Golub-Smith, Mills, & Durso, 1995). The simulation studies are useful in designing the CAT and examining the various technical aspects of the tests.…”

Section: Comparability Tssues Specific To Cat and Pandpmentioning

confidence: 99%

“…The three aspects are discussed separately below. Eignor, 1993; Eignor, Stocking, Way & Steffen, 1993). In realistic testing settings, considerations should also be given to less explicit specification such as balancing keys, choosing passage topics and balancing references to gender, ethnicity and other background subject.…”

Section: The Validity Criterionmentioning

confidence: 99%

Evaluating Comparability in Computerized Adaptive Testing: Issues, Criteria and an Example

Wang¹,

Kolen

2001

J Educational Measurement

View full text Add to dashboard Cite

When a computerized adaptive testing (CAT) version of a test co‐exists with its paper‐and‐pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other. In this paper, we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT. The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number‐correct score‐based scoring to IRT ability estimation‐based scoring. In addition, changes in components of a CAT, such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores.

show abstract

“…Taking this into account, many graduate schools in Besides the advantages mentioned above, CAT in combination with IRT make it possible to calculate comparable proficiencies between individuals who answered different sets of items, and at different times [14,32]. This greatly facilitates evaluating constructs on a large-scale resulting in its use in important examinations, such as the Graduate Record Examination (GRE) [6,11], developed by the Educational Testing Service (ETS) in 1996; the TOEFL [10,12,33], also developed by ETS and the Armed Services Vocational Aptitude Battery Test [23,24], developed by the United States Department of Defense to select potential recruits for military service.…”

Section: Introductionmentioning

confidence: 99%

Academic English Proficiency Assessment Using a Computerized Adaptive Test

Cúri¹,

Silva²

2019

Tend. Mat. Apl. Comput.

View full text Add to dashboard Cite

This paper describes the steps to convert a paper-and-pencil English proficiency test for academic purposes, consisting of multiple choice items administered following the Admissible Probability Measurement Procedure [24], adopted by the graduate program at the Institute of Mathematics and Computer Sciences at the University of São Paulo (ICMC-USP), Brazil, to a computerized adaptive test (CAT) based on an Item Response Theory Model (IRT). Despite the fact that the program accepts various internationally recognized tests that attest non-native speakers English proficiency, such as the Test of English as a Foreign Language (TOEFL), the International English Language Testing System (IELTS) and the Cambridge English: Proficiency (CPE), for instance, its requirement is incoherent in public universities in Brazil due to the cost, which ranges from US$ 200.00 to US$ 300.00 per exam. The TAI-PI software (Computerized Adaptive Test for English Proficiency), which was developed in Java language and SQLite, started to be used to assess the English pro?ciency of students on the program from October, 2013. The statistical methodology used was defined considering the history and aims of the test and adopted Samejima's Graded Response Model [21], the Kullback-Leibler information criterion for item selection, the a posteriori estimation method for latent trait [2] and the Shadow Test approach [29] to impose restrictions (content and test length) on the test composition of each individual. A description of the test design, the statistical methods used, and the results of a real application of TAI-PI for graduate students are presented in this paper, as well as the validation studies of the new methodology for pass or fail classification, showing the good quality of the new evaluation system and examination of improvement using the IRT and CAT methods.

show abstract

Deriving Comparable Scores for Computer Adaptive and Conventional Tests: An Example Using the Sat1,2

Cited by 8 publications

References 4 publications

A Meta-Analysis of Testing Mode Effects in Grade K-12 Mathematics Tests

A Meta-Analysis of Testing Mode Effects in Grade K-12 Mathematics Tests

Evaluating Comparability in Computerized Adaptive Testing: Issues, Criteria and an Example

Academic English Proficiency Assessment Using a Computerized Adaptive Test

Contact Info

Product

Resources

About