Item response theory (IRT) provides procedures for scoring tests including any combination of rated constructedresponse and keyed multiple-choice items, in that each response pattern is associated with some modal or expected a posteriori estimate of trait level. However, various considerations that frequently arise in large-scale testing make response-pattern scoring an undesirable solution. Methods are described based on IRT that provide scaled scores, or estimates of trait level, for each summed score for rated responses, or for combinations of rated responses and multiple-choice items. These methods may be used to combine the useful scale properties of IR'r-based scores with the practical virtues of a scale based on a summed score for each examinee. Index terms: graded response model, item response theory, ordered responses, polytomous models, scaled scores.
Multidimensional computerized adaptive testing (MCAT) provides a mechanism by which the simultaneous goals of accurate prediction and minimal testing time for a screening test could both be met. This article demonstrates the use of MCAT to administer a screening test for the Computerized Adaptive Testing–Armed Services Vocational Aptitude Battery (CAT-ASVAB) under a variety of manipulated conditions. CAT-ASVAB is a test battery administered via unidimensional CAT (UCAT) that is used to qualify applicants for entry into the U.S. military and assign them to jobs. The primary research question being evaluated is whether the use of MCAT to administer a screening test can lead to significant reductions in testing time from the full-length selection test, without significant losses in score precision. Different stopping rules, item selection methods, content constraints, time constraints, and population distributions for the MCAT administration are evaluated through simulation, and compared with results from a regular full-length UCAT administration.
A developmental scale for the North Carolina End-of-Grade Mathematics Tests was created using a subset of identical test forms administered to adjacent grade levels. Thurstone scaling and item response theory (IRT) techniques were employed to analyze the changes in grade distributions across these linked forms. Three variations of Thurstone scaling were examined, one based on Thurstone's 1925 procedure and two based on Thurstone's 1938 procedure. The IRT scaling was implemented using both BtMAtN and MULTILOG. All methods indicated that average mathematics performance improved from Grade 3 to Grade 8, with similar results for the two IRT analyses and one version of Thurstone's 1938 method. The standard deviations of the IRT scales did not show a consistent pattern across grades, whereas those produced by Thurstone's 1925 procedure generally decreased; one version of the 1938 method exhibited slightly increasing variation with increasing grade level, while the other version displayed inconsistent trends.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.