A framework for evaluation and use of automated scoring of constructed‐response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are discussed within the framework. The fit between the scoring capability and the assessment purpose, the agreement between human and automated scores, the consideration of associations with independent measures, the generalizability of automated scores as implemented in operational practice across different tasks and test forms, and the impact and consequences for the population and subgroups are proffered as integral evidence supporting use of automated scoring. Specific evaluation guidelines are provided for using automated scoring to complement human scoring for tests used for high‐stakes purposes. These guidelines are intended to be generalizable to new automated scoring systems and as existing systems change over time.
A single‐administration classification reliability index is described that estimates the probability of consistently classifying examinees to mastery or nonmastery states as if those examinees had been tested with two alternate forms. The procedure is applicable to any test used for classification purposes, subdividing that test into two half‐tests, each with a cut score, where the sum of the two half‐test cut scores is equal to the cut score for the total test. The application of this pass‐fail consistency index to binary scored objective tests, nonbinary scored performance tests, and tests containing both binary and nonbinary scored questions is presented. A calculation example is provided together with look‐up tables.
In this research, we investigated the feasibility of implementing the e‐rater® scoring engine as a check score in place of all‐human scoring for the Graduate Record Examinations® (GRE®) revised General Test (rGRE) Analytical Writing measure. This report provides the scientific basis for the use of e‐rater as a check score in operational practice. We proceeded with the investigation in four phases. In phase I, for both argument and issue prompts, we investigated the quality of human scoring consistency across individual prompts, as well as two groups of prompts organized into sets. The sets were composed of prompts with separate focused questions (i.e., variants) that must be addressed by the writer in the process of responding to the topic of the prompt. There are also groups of variants of prompts (i.e., grouped for scoring purposes by similar variants). Results showed adequate human scoring quality for model building and evaluation. In phase II, we investigated eight different e‐rater model variations each for argument and issue essays including prompt‐specific; variant‐specific; variant‐group–specific; and generic models both with and without content features at the rating level, at the task score level, and at the writing score level. Results showed the generic model was a valued alternative to the prompt‐specific, variant‐specific, and variant‐group–specific models, with and without the content features. In phase III, we evaluated the e‐rater models on a recently tested group from the spring of 2012 (between March 18, 2012, to June 18, 2012) following the introduction of scoring benchmarks. Results confirmed the feasibility of using a generic model at the rating and task score level and at the writing score level, demonstrating reliable cross‐task correlations, as well as divergent and convergent validity. In phase IV of the study, we purposely introduced a bias to simulate the effects of training the model on a potentially less able group of test takers in the spring of 2012. Results showed that use of the check‐score model increased the need for adjudications between 5% and 8%, yet the increase in bias actually increased the agreement of the scores at the analytical writing score level with all‐human scoring.
Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research freely available to the professional community and to the general public. Published accounts of ETS research, including papers in the ETS Research Report series, undergo a formal peer-review process by ETS staff to ensure that they meet established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that outside organizations may provide as part of their own publication processes. Peer review notwithstanding, the positions expressed in the ETS Research Report series and other published accounts of ETS research are those of the authors and not necessarily those of the Officers and Trustees of Educational Testing Service.The Daniel Eignor Editorship is named in honor of Dr. Daniel R. Eignor, who from 2001 until 2011 served the Research and Development division as Editor for the ETS Research Report series. The Eignor Editorship has been created to recognize the pivotal leadership role that Dr. Eignor played in the research publication process at ETS. In this research report, we present an empirical argument for the use of a contributory scoring approach for the 2-essay writing assessment of the analytical writing section of the GRE ® test in which human and machine scores are combined for score creation at the task and section levels. The approach was designed to replace a currently operational all-human check scoring approach in which machine scores are used solely as quality-control checks to determine when additional human ratings are needed due to unacceptably large score discrepancies. We use data from 6 samples of essays collected from test takers during operational administrations and special validity studies to empirically evaluate 6 different score computation methods. During the presentation of our work, we critically discuss key methodological design decisions and underlying rationales for these decisions. We close the report by discussing how the research methodology is generalizable to other testing programs and use contexts. ETS Research Report Series ISSN 2330-8516 R E S E A R C H R E P O R T Implementing a Contributory Scoring Approach for the GRE ® Analytical Writing Section: A Comprehensive Empirical InvestigationKeywords Automated essay scoring; check scoring approach; contributory scoring approach; GRE ® ; GRE ® analytical writing; writing assessment; design decisions for automated scoring deployment; scoring methodology doi:10.1002/ets2.12142Automated essay scoring is a term that describes various artificial intelligence scoring technologies for extended writing tasks and is employed in many large-scale testing programs; see Shermis and Hamner (2013) for a comparison of different applications. Under an automated essay scoring approach, through use of specialized software, digitally submitted essays get aut...
“A new system for transliterating Egyptian hieroglyphs graphematically” The way in which Egyptologists transcribe non-Egyptian words in Egyptian texts is not only very inconsistant, but also very inadequate. Although there have been several attempts to change this, none of these systems has been convincing. Here we propose a new approach for transliterating lexemes written in Egyptian sign by sign, which is based on the transliterations used in different neighboring disciplines. keywords: Transliteration, Syllabische Schrift, Gruppenschrift, Fremdwörter, Umschrift
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.