Evaluating CEFR rater performance through the analysis of spoken learner corpora

Huang, Lan-fen; Kubelec, Simon; Keng, Nicole; Hsu, Lung-hsun

doi:10.1186/s40468-018-0069-0

Cited by 34 publications

(11 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It signifies that the severity practiced by the two groups was not identical. This finding is consistent with those of Attali (2016); Davis (2016) and Huang et al (2018), who reported that raters with varying rating experience provided heterogeneous ratings, even though the studies were implemented in different contexts. This consistency may be due to how rating experience among raters was operationally defined.…”

supporting

confidence: 91%

“…Empirically, conflicting findings emerged from the literature in terms of how raters' experience has impacted rating quality. Raters of different experiences were reported to show distinct rating quality in some studies (Davis 2016;Huang et al 2018;Kim 2015), but differences were not observed in other studies (Ahmadi Shirazi, 2019;Isaacs & Thomson, 2013;Şahan & Razı, 2020).…”

Section: Introductionmentioning

confidence: 94%

“…However, both groups of raters were successful in generating homogeneous ratings after training. Huang et al (2018) found out that raters showed different levels of severity when they are compared according to their first language within the context of language testing. Recently, Ahmadi Shirazi (2019) assigned raters of writing test to rate using two rating methods (holistic and analytical) and concluded that raters of writing test displayed different levels of severity and leniency.…”

Section: Raters' Rating Performancementioning

confidence: 99%

See 2 more Smart Citations

Rating Performance among Raters of Different Experience Through Multi-Facet Rasch Measurement (MFRM) Model

Noh

Matore

2020

Eğitimde Ve Psikolojide Ölçme Ve Değerlendirme Dergisi

View full text Add to dashboard Cite

One's experience can greatly contribute to a diversified rating performance in educational scoring. Heterogeneous ratings can negatively affect examinees' results. The aim of the study is to examine raters' rating performance in assessing oral tests among lower secondary school students using Multi-facet Rasch Measurement (MFRM) model indicated by raters' severity. Respondents are thirty English Language teachers clustered into two groups based on their rating experience in high-stakes assessment. The respondents listened to ten examinees' recorded answers of three oral test items and provided their ratings. Instruments include items, examinees' answers, scoring rubric, and scoring sheet used to appraise examinees' competence in three domains which are vocabulary, grammar, and communicative competence. MFRM analysis showed that raters exhibited diversity in their severity level with chisquare χ 2 =2.661. Raters' severity measures ranged from 2.13 to -1.45 logits. Independent t-test indicated that there was a significant difference in ratings provided by the inexperienced and the experienced raters, t-value = -0.96, df = 28, p<0.01. The findings of this study suggest that assessment developers must ensure raters are well versed before they can rate examinees in operational settings gained through assessment practices or rater training. Further research is needed to account for the varying effects of rating experience in other assessment contexts and the effects of interaction between facets on estimates of examinees' measures. The present study provides additional evidence with respect to the role of rating experience in inspiring raters to provide accurate ratings.

show abstract

supporting

confidence: 91%

Section: Introductionmentioning

confidence: 94%

Section: Raters' Rating Performancementioning

confidence: 99%

See 1 more Smart Citation

Rating Performance among Raters of Different Experience Through Multi-Facet Rasch Measurement (MFRM) Model

Noh

Matore

2020

Eğitimde Ve Psikolojide Ölçme Ve Değerlendirme Dergisi

View full text Add to dashboard Cite

show abstract

“…It is not a common practice to evaluate and monitor the teacher's or rater's performance in IEPs (Huang, et al, 2018). It is thought by the author of this paper that there are some reasons behind it.…”

Section: Problemmentioning

confidence: 96%

The Rater Performance Categorization System (RPCS) for Intensive English Programs

Şahin

2021

education

View full text Add to dashboard Cite

There are several student performance are assessed in Intensive English Programs (IEP) worldwide in each academic year. These student performances are mostly graded by human raters with a certain degree of error. However, the accuracy of these performance assessment is of utmost importance because they feed data into some high stakes decisions about the students and such performance assessments constitute a large number of students’ scores. Therefore, the accuracy of these performance assessments should be given priority by the IEPs. However, when the current rater performance monitors systems which can help the administrators of IEPs to monitor rater performance in performance assessment are away from practicality because they require the use of complex mathematical models and specialized software. A practical and easy to maintain rater performance categorization system is proposed in this paper and it was accompanied by a sample study Its benefits to the administrators of IEPs and their raters are also discussed besides its practical considerations.

show abstract

“…It was learned that raters' severity remained consistent before and after the training yet level of agreement among raters improved at the end of the scoring sessions. Bijani (2018) and Huang, Kubelec, Keng, & Hsu (2018) have chosen samples from experienced raters and inexperienced raters. Both studies concluded that the two groups of raters were able to attain the same standard of inter-rater reliability after rater training was given.…”

Section: Factors Influencing Rater Accuracy In Language Testingmentioning

confidence: 99%

Brunswik’s Lens Model: This Is How to Inspire Accurate Raters

Noh¹,

Matore²

2019

View full text Add to dashboard Cite

Rating accuracy is one of the fundamental standards in educational assessment to ensure the quality and integrity. Inaccuracy in academic assessment engenders negative implications towards student's motivation and raters' credibility. Therefore, this paper seeks to provide a discussion on rating accuracy in educational assessment based on Brunswik's lens model. The model contends that raters' ratings are not completed directly but through the existence of many factors including raters' variability, rating scales and domains assessed. Raters' idiosyncrasy is scrutinized by describing varied sources that can threaten rating accuracy. This model explains how intervening factors moderate the relationship between candidates' capabilities and observed scores. The discussion may shed some light on the endeavors to inspire raters to be effective and uphold the values of reliable raters through the implementation of thoughtful rater training that incorporates scoring practices, exposure on rater bias and self-directed reflection. Future attempts are necessary for understanding the interaction among intervening factors that influence raters and differences of rating accuracy produced by internal and external raters.

show abstract

Evaluating CEFR rater performance through the analysis of spoken learner corpora

Cited by 34 publications

References 21 publications

Rating Performance among Raters of Different Experience Through Multi-Facet Rasch Measurement (MFRM) Model

Rating Performance among Raters of Different Experience Through Multi-Facet Rasch Measurement (MFRM) Model

The Rater Performance Categorization System (RPCS) for Intensive English Programs

Brunswik’s Lens Model: This Is How to Inspire Accurate Raters

Contact Info

Product

Resources

About