Rater types in writing performance assessments: A classification approach to rater variability

Eckes, Thomas

doi:10.1177/0265532207086780

Cited by 207 publications

(151 citation statements)

References 39 publications

Supporting

Mentioning

133

Contrasting

Unclassified

Order By: Relevance

“…One analysis that might shed light on some of the differences across topics would be a many-faceted Rasch analysis using the FACETS software (Linacre, 2010; see also Myford & Wolfe, 2003, 2004, for details of this method of analysis), which can be used to estimate rater severity and task difficulty on the same linear scale, allowing investigation of questions such as whether specific raters judge essays on certain topics more severely than others. This analysis could provide more detailed information about rater bias, and along with e-rater feature scores could complement recent research on the factors that influence rater behavior (e.g., Eckes, 2008).…”

Section: Implications and Future Directionsmentioning

confidence: 81%

“…One possible explanation for this result may be found in the research finding that essay raters do not base their scores strictly on the wording of a specific scale (see Eckes, 2008, for a recent review of the literature on rater behavior). For example, Lumley (2002) noted that raters' judgments seem to be based on "some complex and indefinable feeling about the text, rather than the scale content" and that raters form "a uniquely complex impression independently of the scale wordings."…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

Weigle

2011

ETS Research Report Series

View full text Add to dashboard Cite

ETS is an Equal Opportunity/Affirmative Action Employer.As part of its educational and social mission and in fulfilling the organization's non-profit Charter and Bylaws, ETS has and continues to learn from and also to lead research that furthers educational and measurement research to advance quality and equity in education and assessment for all users of the organization's products and services.Copyright © 2011 by ETS. All rights reserved.No part of this report may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Violators will be prosecuted in accordance with both U.S. and international copyright laws.CRITERION, E-RATER, ETS, the ETS logos, GRADUATE RECORD EXAMINATIONS, GRE, LISTENING. LEARNING. LEADING., TOEFL, TOEFL iBT, the TOEFL logo, and TWE are registered trademarks of Educational Testing Service (ETS).COLLEGE BOARD and SAT are registered trademarks of the College Entrance Examination Board.i Abstract Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study addresses two validityrelated issues regarding the use of e-rater ® with the independent writing task on the TOEFL iBT ® (Internet-based test). First, relationships between automated scores of iBT tasks and nontest indicators of writing ability were examined. This was followed by exploration of prompt-related differences in automated scores of essays written by the same examinees. Correlations between both human and e-rater scores and nontest indicators were moderate but consistent, with few differences between e-rater and human rater scores. E-rater was more consistent across prompts than individual human raters, although there were differences in scores across prompts for the individual features used to generate total e-rater scores. ETS administers the TOEFL program under the general direction of a policy board that was established by, and is affiliated with, the sponsoring organizations. Members of the TOEFL Board (previously the Policy Council) represent the College Board, the GRE Board, and such institutions and agencies as graduate schools of business, two-year colleges, and nonprofit educational exchange agencies.   Since its inception in 1963, the TOEFL has evolved from a paper-based test to a computer-based test and, in 2005, to an Internet-based test, TOEFL iBT ® . One constant throughout this evolution has been a continuing program of research related to the TOEFL test. From 1977 to 2005, nearly 100 research and technical reports on the early versions of TOEFL were published. In 1997, a monograph series that laid the groundwork for the development of TOEFL iBT was launched. With the release of TOEFL iBT, a TOEFL iBT report series has been introduced.Currently this r...

show abstract

Section: Implications and Future Directionsmentioning

confidence: 81%

Section: Discussionmentioning

confidence: 99%

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

Weigle

2011

ETS Research Report Series

View full text Add to dashboard Cite

show abstract

“…Raters also judge students' writing ability differently depending on their academic background and sex (Vann, Lorenz & Meyer, 1991), and the training received (Weigle, 1994). Studies such as Cumming (1990), Eckes (2008), Esfandiari & Myford (2013), González & Roux (2013), Lim (2011), Shi (2001), Shi, Wan, & Wen (2003) and Wiseman (2012), describe how distinct rater backgrounds influence (or do not influence) their rating behavior, actual scores and scoring procedures. Lim (2011), for instance, focused on experienced and inexperienced raters.…”

Section: Introductionmentioning

confidence: 99%

Assessing EFL university students' writing: A study of score reliability

González

Trejo

Roux

2017

REDIE

View full text Add to dashboard Cite

The assessment of English as a Foreign Language (EFL) writing is a complex activity that is subject to human judgment, which makes it difficult to achieve a fair, accurate and reliable assessment of student writing (Pearson, 2004: 117;Hamp-Lyons, 2003). This study reports on the variability that exists between the analytical grades that 11 Mexican EFL university teachers awarded to five writing samples. It describes the raters' views on writing assessment and their use of analytical scoring rubrics. Data obtained from the grades awarded to each paper, and a background questionnaire, suggested that great variability was found between grades, and raters differed in their levels of leniency and severity, suggesting that having similar backgrounds and using the same rubric are not enough to ensure rater reliability. Participants' perceptions were found to be similar in terms of the use of analytical rubrics.Palabras: EFL writing, EFL writing assessment, Rater reliability, Analytical scoring rubric. ResumenLa evaluación de la escritura en inglés como lengua extranjera (EFL) es un proceso que depende del juicio humano, por lo que es difícil de obtener evaluaciones justas, acertadas y confiables (Pearson, 2004, p. 117;Hamp-Lyons, 2003). Este estudio reporta la variabilidad existente entre las calificaciones analíticas que 11 docentes Mexicanos universitarios de EFL proporcionaron a cinco trabajos escritos. Describe las percepciones de los participantes en torno a la evaluación de la escritura y el uso de las rúbricas de evaluación. Los datos obtenidos de las calificaciones de cada trabajo y de un cuestionario escrito revelaron que existe gran variedad entre las calificaciones proporcionadas y que los evaluadores difieren en sus niveles de exigencia, sugiriendo así que antecedentes homogéneos y el uso de una misma rúbrica 1 The authors presented preliminary results of this study at the 2014 TESL Canada Conference at the University of Regina, Saskatchewan, Canada, May 8-10, 2014 Assessing EFL university students' writing: a study of score reliability Palabras clave: Inglés como lengua extranjera, Evaluación de la escritura, Confiabilidad, Rúbrica de evaluación.

show abstract

“…Számos kutató (az íráskészség esetében például Weigle, 1998;Engelhard és Myford, 2003;Eckes, 2005Eckes, , 2008Schoonen, 2005; a beszédkészség esetében például Vidakovic és Galaczi, 2009) modellezte az értékelık tulajdonságait. E vizsgálatok tanulságai szerint az értékelık különbözhetnek az értéke-lési szempontok értelmezésében és alkalmazásában, a vizsgázók nyelvi teljesítményé-nek szigorú vagy enyhe megítélésében, az értékelési skálák megértésében és használa-tában, a különbözı készségszintő diákok értékelésének következetességében.…”

Section: A Kommunikatív Nyelvi Készségek éS Az Irt-modellekunclassified

Az idegen nyelvi érettségi működése és hatása a tanulói teljesítmények és a tanári nézetek tükrében

Vígh

View full text Add to dashboard Cite

Rater types in writing performance assessments: A classification approach to rater variability

Cited by 207 publications

References 39 publications

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

Assessing EFL university students' writing: A study of score reliability

Az idegen nyelvi érettségi működése és hatása a tanulói teljesítmények és a tanári nézetek tükrében

Contact Info

Product

Resources

About

Rater types in writing performance assessments: A classification approach to rater variability

Cited by 207 publications

References 39 publications

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

Assessing EFL university students' writing: A study of score reliability

Az idegen nyelvi érettségi működése és hatása a tanulói teljesítmények és a tanári nézetek tükrében

Contact Info

Product

Resources

About

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY