One's experience can greatly contribute to a diversified rating performance in educational scoring. Heterogeneous ratings can negatively affect examinees' results. The aim of the study is to examine raters' rating performance in assessing oral tests among lower secondary school students using Multi-facet Rasch Measurement (MFRM) model indicated by raters' severity. Respondents are thirty English Language teachers clustered into two groups based on their rating experience in high-stakes assessment. The respondents listened to ten examinees' recorded answers of three oral test items and provided their ratings. Instruments include items, examinees' answers, scoring rubric, and scoring sheet used to appraise examinees' competence in three domains which are vocabulary, grammar, and communicative competence. MFRM analysis showed that raters exhibited diversity in their severity level with chisquare χ 2 =2.661. Raters' severity measures ranged from 2.13 to -1.45 logits. Independent t-test indicated that there was a significant difference in ratings provided by the inexperienced and the experienced raters, t-value = -0.96, df = 28, p<0.01. The findings of this study suggest that assessment developers must ensure raters are well versed before they can rate examinees in operational settings gained through assessment practices or rater training. Further research is needed to account for the varying effects of rating experience in other assessment contexts and the effects of interaction between facets on estimates of examinees' measures. The present study provides additional evidence with respect to the role of rating experience in inspiring raters to provide accurate ratings.
Rating accuracy is one of the fundamental standards in educational assessment to ensure the quality and integrity. Inaccuracy in academic assessment engenders negative implications towards student's motivation and raters' credibility. Therefore, this paper seeks to provide a discussion on rating accuracy in educational assessment based on Brunswik's lens model. The model contends that raters' ratings are not completed directly but through the existence of many factors including raters' variability, rating scales and domains assessed. Raters' idiosyncrasy is scrutinized by describing varied sources that can threaten rating accuracy. This model explains how intervening factors moderate the relationship between candidates' capabilities and observed scores. The discussion may shed some light on the endeavors to inspire raters to be effective and uphold the values of reliable raters through the implementation of thoughtful rater training that incorporates scoring practices, exposure on rater bias and self-directed reflection. Future attempts are necessary for understanding the interaction among intervening factors that influence raters and differences of rating accuracy produced by internal and external raters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.