Writing scale effects on raters: an exploratory study

Jeong, Heejeong

doi:10.1186/s40468-019-0097-4

Cited by 4 publications

(13 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the observations, raters felt more confusion in their ratings which implied that they got it quite demanding to clearly select one level. This finding is in line with Jeong (2019) who concluded that raters had to consider multiple areas in the analytic rating process and experienced more hesitation and rating conflict. Therefore, these strategies like anchoring happened more in analytic evaluations which can be due to the vague wording of the scale.…”

Section: Discussionsupporting

confidence: 89%

“…Thanks to this, it might be difficult for raters to analyze and interpret the scale components, and through adopting those strategies, they lessened the cognitive load. On the other hand, findings of Jeong ( 2019 ) showed that the scale design has a greater effect on the raters. Thus, designing rating scales and the rating criteria should provide an explicit and reliable foundation for scoring judgments, over and above distinguishing writing performance levels (Weigle, 2002 ).…”

Section: Discussionmentioning

confidence: 98%

“…The results illustrated that raters considered some features such as ideas and text structure more prominent than others. Jeong ( 2019 ) made a comparison between the effects of a binary scale and an analytic scale on rating performance and scores while using two groups of participants: teacher raters and expert raters. He found that because the binary scale lessened the rater cognitive load and was easy to use, the raters were more consistent in their ratings.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Raters’ perceptions of rating scales criteria and its effect on the process and outcome of their rating

2022

View full text Add to dashboard Cite

It is widely believed that human rating performance is influenced by an array of different factors. Among these, rater-related variables such as experience, language background, perceptions, and attitudes have been mentioned. One of the important rater-related factors is the way the raters interact with the rating scales. In particular, how raters perceive the components of the scales to further plan their scoring seems important. For this aim, the present study investigated the raters’ perceptions of the rating scales and their subsequent rating behaviors for two analytic and holistic rating scales. Hence, nine highly experienced raters were asked to verbalize their thoughts while rating student essays using IELTS holistic scale and the analytic scale of ESL Composition Profile. Upon analyzing the think-aloud protocols, four themes emerged. The findings showed that when rating holistically, the raters either referred to the holistic scale components to validate their ratings (validation) or had a pre-evaluation reading to rate in a more reliable way (dominancy). In analytic rating, on the other hand, the raters used a pre-evaluation scale reading in order to keep the components and their criteria to memory to evaluate the text more accurately (dominancy) or regularly moved between the text and the scale components to assign a score (oscillation). Furthermore, the results of a Wilcoxon signed-rank test showed that when using the holistic and analytic rating scales, the raters assigned significantly different scores to the texts. On the whole, the results revealed that the way the raters perceived the scale components will affect their judgement of the texts. The study also provides several implications for rater training programs and EFL writing assessment.

show abstract

Section: Discussionsupporting

confidence: 89%

Section: Discussionmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Raters’ perceptions of rating scales criteria and its effect on the process and outcome of their rating

2022

View full text Add to dashboard Cite

show abstract

“…One examines rater scores quantitatively to detect rater effects; the other examines rater behavior qualitatively to explain how rater effects arise. Rating scores are typically analyzed by MFRM analysis (Jeong, 2019; Myford & Wolfe, 2003; Schaefer, 2008; Trace et al, 2017; Wang et al, 2017). Its central idea is to model many facets (e.g., examinee, criterion, and rater) in the rating process to predict the probability of a particular rating.…”

Section: Current Approaches To Examining Rater Scores and Rater Behaviormentioning

confidence: 99%

“…Whereas the former represents the product of scoring, the latter evidences the process of scoring (Greene et al, 1989). Rater scores are typically examined through many-facet Rasch measurement (MFRM) (e.g., Jeong, 2019; Myford & Wolfe, 2003; Schaefer, 2008), while rater behavior tends to be examined through surveys and/or interviews (e.g., Jeong, 2019), think-aloud protocols (e.g., Barkaoui, 2007), rater comments (e.g., H. J. Kim, 2015), or eye-tracking experiments (e.g., Ballard, 2017; Winke & Lim, 2015).…”

mentioning

confidence: 99%

Triangulating natural language processing (NLP)-based analysis of rater comments and many-facet Rasch measurement (MFRM): An innovative approach to investigating raters’ application of rating scales in writing assessment

Cai,

Yan

2023

Language Testing

View full text Add to dashboard Cite

Rater comments tend to be qualitatively analyzed to indicate raters’ application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The data consisted of ratings on 987 essays by 36 raters (a total of 3948 analytic scores and 1974 rater comments) on a post-admission English Placement Test (EPT) at a large US university. We computed a set of comment-based features based on the analytic components and evaluative language the raters used to infer whether raters were aligned to the scale. For data triangulation, we performed correlation analyses between the MFRM measures of rater performance and the comment-based measures. Although the EPT raters showed overall satisfactory performance, we found meaningful associations between rater comments and performance features. In particular, raters with higher precision and fit to what the Rasch model predicts used more analytic components and used evaluative language more similar to the scale descriptors. These findings suggest that NLP techniques have the potential to help language testers analyze rater comments and understand rater behavior.

show abstract

Experienced but detached from reality: Theorizing and operationalizing the relationship between experience and rater effects

2023

View full text Add to dashboard Cite

Writing scale effects on raters: an exploratory study

Cited by 4 publications

References 21 publications

Raters’ perceptions of rating scales criteria and its effect on the process and outcome of their rating

Raters’ perceptions of rating scales criteria and its effect on the process and outcome of their rating

Triangulating natural language processing (NLP)-based analysis of rater comments and many-facet Rasch measurement (MFRM): An innovative approach to investigating raters’ application of rating scales in writing assessment

Experienced but detached from reality: Theorizing and operationalizing the relationship between experience and rater effects

Contact Info

Product

Resources

About