Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments

Huang, Hung‐Yu

doi:10.1177/01466216231174566

Cited by 4 publications

(3 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Before measuring rater severity, it is crucial to determine the item validity used as parameters for assessing students' writing. To ascertain this, the parameter utilized in the Many-Facet Rasch Measurement is the unidimensionality test (Sumintono & Widhiarso, 2013;Myford & Wolfe, 2004;Huang, 2023). The assumption test criteria used for testing the item's unidimensionality in the MFRM (Many-Facet Rasch Measurement) is the Raw Variance Explained by Measure, with a threshold value greater than 20% (≥ 20%); if it exceeds 40%, it indicates good quality, and if it surpasses 60%, it is considered exceptional (Sumintono & Widhiarso, 2013).…”

Section: Resultsmentioning

confidence: 99%

“…Thus, a comprehensive understanding of raters' tendencies towards being excessively severe or lenient is imperative to ensure fair and equitable evaluations of students' writing proficiency. Rater severity/leniency in students' composition refers to the degree to which raters consistently assign higher or lower scores when evaluating the written work of students (Myford & Wolfe, 2004;Huang, 2023). It is an important aspect of assessment in various educational contexts, including language education and writing courses.…”

Section: Introductionmentioning

confidence: 99%

“…Various factors can contribute to rater severity in students' composition. These factors include individual rater characteristics, such as personal biases, preferences, and experience levels (Fahim & Bijani, 2011;Huang, 2023;Noor, Beram, Huat, Gengatharan, & Mohamad Rasidi, 2023). Raters with strict personal standards may exhibit higher severity, while more lenient raters may display lower severity (Tanaka, 2023).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Rater Severity/Leniency and Bias in EFL Students' Composition Using Many-Facet Rasch Measurement (MFRM)

Rahman,

Apriyanti,

Nurdini

2023

scope

View full text Add to dashboard Cite

<p>The study aims to investigate the extent to which raters exhibit tendencies towards being overly severe, lenient, or even bias when evaluating students' writing compositions in Indonesia. Data were collected from 15 student essays and four raters with master's degrees in English education. The Many-facet Rasch measurement (MFRM), automatized by Minifac software, a program created for the Many-facet Rasch measurement, was used for data analysis. This was done by meticulously dissecting the assessment process into its distinct components—raters, essay items, and the specific traits or criteria being evaluated in the writing rubric. Each rater's level of severity or leniency, essentially how strict or lenient they are in assigning scores, is scrutinized. Likewise, the potential biases that raters might introduce into the grading process are carefully examined. The findings revealed that, while the raters used the rubric consistently when scoring all test takers, they varied in how lenient or severe they were. Scores of 70 were given more frequently than the other score. Based on the findings, composition raters may differ in how they rate students which potentially leading to student dissatisfaction, particularly when raters adopt severe scoring. The bias in scoring has highlighted that certain raters consistently tend to inaccurately score items, deviating from the established criteria (traits). Furthermore, the study also found that having more than four items/criteria (content, diction, structure, and mechanic) is essential to achieve a more diverse distribution of item difficulty and effectively measure students' writing abilities. These results are valuable for writing departments to improve the oversight of inter-rater reliability and rating consistency. To address this issue, implementing rater training is suggested as the most feasible method to ensure more dependable and consistent evaluations.</p>

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Rater Severity/Leniency and Bias in EFL Students' Composition Using Many-Facet Rasch Measurement (MFRM)

Rahman,

Apriyanti,

Nurdini

2023

scope

View full text Add to dashboard Cite

show abstract

Task design and rater effects in task‐based language assessment

O'Grady

2024

TESOL Journal

View full text Add to dashboard Cite

Task‐based language assessment represents a major component of task‐based language teaching syllabi. Current perspectives emphasise the importance of tasks in the assessment process, suggesting that adherence to influential models of language production during task design yields predictable test outcomes. The current study contends that the significance of the rater has been overlooked, resulting in adverse consequences when employing current task‐based frameworks. The article reviews literature on the rating process, with a focus on interactions between rater characteristics, rating scales, and rater effects. Drawing on the findings of the review, the author proposes a revised model of task‐based language assessment that curbs the impact of task design on test scores. The revised model underscores the need to regard raters as active agents in the assessment process to enhance validity and fairness and makes recommendations about minimizing rater effects.

show abstract

Modeling Preferences: A Bayesian Mixture of Finite Mixtures for Rankings and Ratings

Pearce,

Erosheva

2024

Journal of the American Statistical Association

View full text Add to dashboard Cite

Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments

Cited by 4 publications

References 49 publications

Rater Severity/Leniency and Bias in EFL Students' Composition Using Many-Facet Rasch Measurement (MFRM)

Rater Severity/Leniency and Bias in EFL Students' Composition Using Many-Facet Rasch Measurement (MFRM)

Task design and rater effects in task‐based language assessment

Modeling Preferences: A Bayesian Mixture of Finite Mixtures for Rankings and Ratings

Contact Info

Product

Resources

About