2013
DOI: 10.1002/j.2333-8504.2013.tb02343.x
|View full text |Cite
|
Sign up to set email alerts
|

INVESTIGATING THE SUITABILITY OF IMPLEMENTING THE E‐RATER® SCORING ENGINE IN A LARGE‐SCALE ENGLISH LANGUAGE TESTING PROGRAM

Abstract: In this research, we investigated the suitability of implementing e‐rater® automated essay scoring in a high‐stakes large‐scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt‐based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement between the automated and the human score and relations with criterion variables. Results showed that the sample size was generally not sufficient for prompt‐specific scoring. Fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…When the human–machine discrepancy exceeds a given threshold, a second human rating is solicited. Whereas the specific thresholds employed in operational settings have not been reported, prior research has evaluated thresholds from 0.5 to 1.5 on a 5‐ or 6‐point holistic scoring scale (e.g., Zhang, Breyer, & Lorenz, ).…”
Section: Unusual Responses In Automated Essay Scoringmentioning
confidence: 99%
“…When the human–machine discrepancy exceeds a given threshold, a second human rating is solicited. Whereas the specific thresholds employed in operational settings have not been reported, prior research has evaluated thresholds from 0.5 to 1.5 on a 5‐ or 6‐point holistic scoring scale (e.g., Zhang, Breyer, & Lorenz, ).…”
Section: Unusual Responses In Automated Essay Scoringmentioning
confidence: 99%
“…The effect of differences between human raters can substantially increase the bias in the final score without careful monitoring [8]. The manual correction makes human rater labor-intensive, timeconsuming, and expensive [9]. Based on these problems, a computer assessment is needed to help facilitate the assessment.…”
Section: Introductionmentioning
confidence: 99%