A Study of the Use of the e‐rater® Scoring Engine for the Analytical Writing Measure of the <scp>GRE</scp>® revised General Test

Breyer, F. Jay; Attali, Yigal; Williamson, David M.; Ridolfi-McCulla, Laura; Ramineni, Chaitanya; Duchnowski, Matthew; Harris, April

doi:10.1002/ets2.12022

Cited by 5 publications

(9 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, the primary adjudication threshold has been set at 1.5 for both the TOEFL independent task and the PRAXIS ® test's argumentative task under a contributory scoring approach. Similarly, it was recently reset to 2.0 for the TOEFL integrated task, with the potentially undesirable effects of the larger primary threshold on score separation compensated for by giving human ratings twice the weight of the machine scores for reporting purposes (Breyer et al, 2014;Ramineni, Trapani, & Williamson, 2015;Ramineni, Trapani, Williamson, Davey, & Bridgeman, 2012b;Ramineni et al, 2012a).…”

Section: Determining the Primary Adjudication Thresholdmentioning

confidence: 99%

“…The argument of critics is that “nonsense” and perhaps “obviously flawed” essays that result from gaming attempts can be detected by human readers but not always by built‐in machine detectors (i.e., advisories) in the automated scoring system (see Ramineni, Trapani, Williamson, Davey, & Bridgeman, , or Breyer et al, , for a description of the different advisories evaluated for the GRE‐AW section). For the purpose of this report, the sole machine scoring approach is not considered further for the GRE‐AW section because the consequential use of the GRE‐AW section scores is associated with relatively high stakes for individual test takers.…”

Section: Terminology and Motivationmentioning

confidence: 99%

“…A major consideration for determining the primary adjudication threshold through empirical evidence has been the observation that human and machine scores can separate from each other for some—but not all—subgroups of test takers that are pertinent in comprehensive fairness evaluations; we call this issue score separation in short (Breyer et al, ; Bridgeman, Trapani, & Attali, ). Score separation is not something that can be evaluated properly by focusing on human–machine score differences at the rating level but, rather, needs to be evaluated by focusing on these differences at the task or reported scale score level once adjudication procedures have been applied.…”

Section: Terminology and Motivationmentioning

confidence: 99%

“…This methodological step is about determining the weights that the human ratings and, possibly, machine scores receive under a particular scoring approach. Common past practice at ETS has been to equally weight any ratings when creating task scores (Breyer et al, ; Ramineni et al, ). However, as noted, there are exceptions.…”

Section: Terminology and Motivationmentioning

confidence: 99%

See 3 more Smart Citations

Implementing a Contributory Scoring Approach for the GRE^® Analytical Writing Section: A Comprehensive Empirical Investigation

Breyer

Rupp

Bridgeman

2017

ETS Research Report Series

Self Cite

View full text Add to dashboard Cite

Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research freely available to the professional community and to the general public. Published accounts of ETS research, including papers in the ETS Research Report series, undergo a formal peer-review process by ETS staff to ensure that they meet established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that outside organizations may provide as part of their own publication processes. Peer review notwithstanding, the positions expressed in the ETS Research Report series and other published accounts of ETS research are those of the authors and not necessarily those of the Officers and Trustees of Educational Testing Service.The Daniel Eignor Editorship is named in honor of Dr. Daniel R. Eignor, who from 2001 until 2011 served the Research and Development division as Editor for the ETS Research Report series. The Eignor Editorship has been created to recognize the pivotal leadership role that Dr. Eignor played in the research publication process at ETS. In this research report, we present an empirical argument for the use of a contributory scoring approach for the 2-essay writing assessment of the analytical writing section of the GRE ® test in which human and machine scores are combined for score creation at the task and section levels. The approach was designed to replace a currently operational all-human check scoring approach in which machine scores are used solely as quality-control checks to determine when additional human ratings are needed due to unacceptably large score discrepancies. We use data from 6 samples of essays collected from test takers during operational administrations and special validity studies to empirically evaluate 6 different score computation methods. During the presentation of our work, we critically discuss key methodological design decisions and underlying rationales for these decisions. We close the report by discussing how the research methodology is generalizable to other testing programs and use contexts. ETS Research Report Series ISSN 2330-8516 R E S E A R C H R E P O R T Implementing a Contributory Scoring Approach for the GRE ® Analytical Writing Section: A Comprehensive Empirical InvestigationKeywords Automated essay scoring; check scoring approach; contributory scoring approach; GRE ® ; GRE ® analytical writing; writing assessment; design decisions for automated scoring deployment; scoring methodology doi:10.1002/ets2.12142Automated essay scoring is a term that describes various artificial intelligence scoring technologies for extended writing tasks and is employed in many large-scale testing programs; see Shermis and Hamner (2013) for a comparison of different applications. Under an automated essay scoring approach, through use of specialized software, digitally submitted essays get aut...

show abstract

Section: Determining the Primary Adjudication Thresholdmentioning

confidence: 99%

Section: Terminology and Motivationmentioning

confidence: 99%

Section: Terminology and Motivationmentioning

confidence: 99%

Section: Terminology and Motivationmentioning

confidence: 99%

See 2 more Smart Citations

Implementing a Contributory Scoring Approach for the GRE^® Analytical Writing Section: A Comprehensive Empirical Investigation

Breyer

Rupp

Bridgeman

2017

ETS Research Report Series

Self Cite

View full text Add to dashboard Cite

show abstract

“…The automated scoring models based on e‐rater have been successfully evaluated in recent years for the writing prompts included in the old GRE General Test (Ramineni, Trapani, Williamson, Davey, & Bridgeman, ), the TOEFL iBT ® test (Attali, Bridgeman, & Trapani, ; Ramineni, Trapani, Williamson, Davey, & Bridgeman, ), and the GRE revised General Test (Breyer et al, ). The current TOEFL iBT test uses e‐rater for operational scoring of the essay tasks, and the GRE uses e‐rater as a quality control on the reported human scores, thus allowing the programs to report scores efficiently and to use their human rater pool more effectively.…”

Section: Introductionmentioning

confidence: 99%

Evaluation of e‐rater^® for the Praxis I^® Writing Test

Ramineni

Trapani

Williamson

2015

ETS Research Report Series

Self Cite

View full text Add to dashboard Cite

Automated scoring models were trained and evaluated for the essay task in the Praxis I® writing test. Prompt‐specific and generic e‐rater® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the e‐rater model performance against human scores. Performance of the scoring model was also evaluated across different demographic subgroups using the same statistics. Additionally, correlations for automated scores with external measures were observed for validity evidence. Analyses were performed to establish appropriate agreement thresholds between human and e‐rater scores for unusual essays and to examine the impact of using e‐rater on operational scores and classification rates. The generic e‐rater scoring model was recommended for operational use to produce contributory scores within a discrepancy threshold of 1.5 with a human score.

show abstract

Automatic Short Answer Grading Using a LSTM Based Approach

Chakraborty,

Mishra

2023

2023 IEEE World Conference on Applied Intelligence and Computing (AIC)

View full text Add to dashboard Cite

A Study of the Use of the e‐rater^® Scoring Engine for the Analytical Writing Measure of the GRE^® revised General Test

Cited by 5 publications

References 16 publications

Implementing a Contributory Scoring Approach for the GRE^® Analytical Writing Section: A Comprehensive Empirical Investigation

Implementing a Contributory Scoring Approach for the GRE^® Analytical Writing Section: A Comprehensive Empirical Investigation

Evaluation of e‐rater^® for the Praxis I^® Writing Test

Automatic Short Answer Grading Using a LSTM Based Approach

Contact Info

Product

Resources

About

A Study of the Use of the e‐rater® Scoring Engine for the Analytical Writing Measure of the GRE® revised General Test

Cited by 5 publications

References 16 publications

Implementing a Contributory Scoring Approach for the GRE® Analytical Writing Section: A Comprehensive Empirical Investigation

Implementing a Contributory Scoring Approach for the GRE® Analytical Writing Section: A Comprehensive Empirical Investigation

Evaluation of e‐rater® for the Praxis I® Writing Test

Automatic Short Answer Grading Using a LSTM Based Approach

Contact Info

Product

Resources

About

A Study of the Use of the e‐rater^® Scoring Engine for the Analytical Writing Measure of the GRE^® revised General Test

Implementing a Contributory Scoring Approach for the GRE^® Analytical Writing Section: A Comprehensive Empirical Investigation

Implementing a Contributory Scoring Approach for the GRE^® Analytical Writing Section: A Comprehensive Empirical Investigation

Evaluation of e‐rater^® for the Praxis I^® Writing Test