The purpose of this research project was to confirm that differences in the severity of judges and the stringency of grading periods occur, regardless of the nature of the assessment or the examination materials used. Three rather different examinations that require judges were analyzed, using an extended Rasch model to determine whether differences in judge severity and grading-period stringency were observable for all three examinations. Significant variation in judge severity and some variation across grading periods were found on all three examinations. This implies that regardless of the nature of the examination, items, or judges, examinee/measures cannot be considered independent of the particular judges involved unless correction for severity is made systematically. Accounting for judge severity and gradinig-period stringency is extremely important when pass/fail decisions that are meant to generalize to competence are made, as in certification examinations.
This paper presents a method for analyzing oral examinations with an extended, many-faceted Rasch model that calibrates medical specialty candidates, protocols, and raters. Significant variance was found among protocol difficulties and rater severities. When candidates' raw scores were compared with calibrated measures corrected for the bias caused by the particular protocols and raters encountered, variation between candidate scores and measures were observed. The data were found to fit the Rasch model well enough to be suitable for making measurement on oral examinations more objective as well as providing specific feedback to oral examination raters. In this example a medical oral examination was used; however, the techniques are applicable to any situation in which trained professionals rate candidate or patient performances. For occupational therapists, potential applications include evaluation of a student's fieldwork performance or observation of a patient's task performance.
The purpose of this article is to discuss the importance of decision reproducibility for performance assessments. When decisions from two judges about a student's performance using comparable tasks correlate, decisions have been considered reproducible. However, when judges differ in expectations and tasks differ in difficulty, decisions may not be independent of the particular judges or tasks encountered unless appropriate adjustments for the observable differences are made. In this study, data were analyzed with the Facets model and provided evidence that judges grade differently, whether or not the scores given correlate well. This outcome suggests that adjustments for differences among judge severities should be made before student measures are estimated to produce reproducible decisions for certification, achievement, or promotion.
This article discusses the use of the Many-Facets Rasch Model, via the FACETS computer program (Linacre, 2006a), to scale job/practice analysis survey data as well as to combine multiple rating scales into single composite weights representing the tasks' relative importance. Results from the Many-Facets Rasch Model are compared with those calculated from the Rasch Rating Scale Model (RRSM) (Spray & Huang, 2000) using two examples of actual job analysis data from diverse professions. In addition, this article proposes a solution for establishing the origin of the percentage scale when transferring the task importance weights from a logit unit into percentage weights. Although the resulting test specifications from the two compared methods are not radically different, a case is made that the use of the Many-Facets Rasch Model with a zero point based on the frequency rating scale provides a more justifiable basis for combining multiple rating scales and transforming task survey data. In addition, this study found that the Many-Facets Rasch Rating Model can better accommodate missing data than the RRSM method in situations in which respondents only rate subsets of the multiple scales and not all of the scales for the tasks being surveyed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.