The goal of this study was to assess the feasibility of an approach to adaptive testing based on item models. A simulation study was designed to explore the affects of item modeling on score precision and bias, and two experimental tests were administered -an experimental, on-the-fly, adaptive quantitative-reasoning test as well as a linear test. Results of the simulation study showed that under different levels of isomorphicity, there was no bias, but precision of measurement was eroded, especially in the middle range of the true-score scale. However, the comparison of adaptive test scores with operational Graduate Record Examinations (GRE) test scores matched the test-retest correlation observed under operational conditions. Analyses of item functioning on linear forms suggested a high level of isomorphicity across items within models. The current study provides a promising first step toward significant cost and theoretical improvement in test creation methodology for educational assessment.
Response generative modeling (RGM) is an approach to psychological measurement which involves a "grammar" capable of assigning a psychometric description to every item in a universe of items and is also capable of generating all the items in that universe. The purpose of this chapter is to: 1) elaborate on the rationale behind RGM; 2) review its roots and how it relates to current thinking on validity; and 3) assess its feasibility in a wide variety of domains. The chapter concludes with a brief review of possible theoretical approaches to a psychologically sound approach to test construction and modelling.
Early work on automated scoring predated the ready availability of mechanisms for inexpensively delivering computer‐based tests and collecting responses. Hence, this work used responses to conventionally delivered tasks that had somehow been translated to machine‐readable form. The necessity of operating in this manner focused attention initially on the empirical characteristics of automated scores. As the availability of computer‐based testing environments grew it became possible to implement entire operational exams and, thus, to think broadly about the implications of automated scoring for validity. In this paper we argue that a comprehensive discussion of validity and automated scoring includes the interplay among construct definition and test and task design; examinee interface; tutorial; test development tools; automated scoring; and reporting—for in the development process these components affect one another. As modern validity theory postulates, the validation argument must, therefore, ideally provide not only empirical evidence of score relationships but also theoretical rationales to support a variety of design decisions. We further argue that the interdependency among computer‐based test components provides a unique opportunity to greatly improve educational and occupational assessment.
Psychometric and architectural principles were integrated to create a general approach for scoring open-ended architectural site-design test problems. In this approach, solutions are examined and described in terms of design features, and those features are then mapped onto a scoring scale by means of scoring rules. This methodology was applied to two problems that had been administered as part of a national certification test. Because the test is not currently administered by computer, the paper-and-pencil solutions were first converted to machine-readable form. One problem dealt with the spatial arrangement of buildings in a country club, and the other called for regrading of a site by rearranging contours. In both instances, the results suggest that computer scoring is feasible.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.