Conceptualizing Rater Judgments and Rating Processes for Rater‐Mediated Assessments

Wang, Jue; Engelhard, George

doi:10.1111/jedm.12226

Cited by 11 publications

(6 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The numbers of ratees (500 or 1,000) and assessment criteria (3 or 5) were manipulated across the simulation conditions. Note that we applied assessment criteria rather than items that have been commonly utilized in IRT models to describe the characteristics of evaluation indicators because assessment criteria or domains are more frequently employed than items in the literature on rater-mediated assessments (e.g., Jin & Wang, 2018 ; Wang & Engelhard, 2019 ). Each criterion was judged by raters on a five-point rating scale.…”

Section: Methodsmentioning

confidence: 99%

Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments

Huang

2023

Applied Psychological Measurement

View full text Add to dashboard Cite

Rater effects are commonly observed in rater-mediated assessments. By using item response theory (IRT) modeling, raters can be treated as independent factors that function as instruments for measuring ratees. Most rater effects are static and can be addressed appropriately within an IRT framework, and a few models have been developed for dynamic rater effects. Operational rating projects often require human raters to continuously and repeatedly score ratees over a certain period, imposing a burden on the cognitive processing abilities and attention spans of raters that stems from judgment fatigue and thus affects the rating quality observed during the rating period. As a result, ratees’ scores may be influenced by the order in which they are graded by raters in a rating sequence, and the rating order effect should be considered in new IRT models. In this study, two types of many-faceted (MF)-IRT models are developed to account for such dynamic rater effects, which assume that rater severity can drift systematically or stochastically. The results obtained from two simulation studies indicate that the parameters of the newly developed models can be estimated satisfactorily using Bayesian estimation and that disregarding the rating order effect produces biased model structure and ratee proficiency parameter estimations. A creativity assessment is outlined to demonstrate the application of the new models and to investigate the consequences of failing to detect the possible rating order effect in a real rater-mediated evaluation.

show abstract

Section: Methodsmentioning

confidence: 99%

Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments

Huang

2023

Applied Psychological Measurement

View full text Add to dashboard Cite

show abstract

“…Like MICHI estimation, the row averaging method [107][108][109] relies on bivariate frequencies f ij (see Equation ( 7)). A matrix B with entries b ij = log( f ij / f ji ) is formed.…”

Section: Row Averaging Methods (Ra)mentioning

confidence: 99%

A Comprehensive Simulation Study of Estimation Methods for the Rasch Model

Robitzsch

2021

Stats

View full text Add to dashboard Cite

The Rasch model is one of the most prominent item response models. In this article, different item parameter estimation methods for the Rasch model are systematically compared through a comprehensive simulation study: Different alternatives of joint maximum likelihood (JML) estimation, different alternatives of marginal maximum likelihood (MML) estimation, conditional maximum likelihood (CML) estimation, and several limited information methods (LIM). The type of ability distribution (i.e., nonnormality), the number of items, sample size, and the distribution of item difficulties were systematically varied. Across different simulation conditions, MML methods with flexible distributional specifications can be at least as efficient as CML. Moreover, in many situations (i.e., for long tests), penalized JML and JML with ε adjustment resulted in very efficient estimates and might be considered alternatives to JML implementations currently used in statistical software. Moreover, minimum chi-square (MINCHI) estimation was the best-performing LIM method. These findings demonstrate that JML estimation and LIM can still prove helpful in applied research.

show abstract

“…Linacre (1989) developed MFRM to accommodate a many-faceted data structure, especially for introducing raters as a third facet in addition to students and items in educational testing settings. MFRM is widely used for examining rater effects in performance assessments (Engelhard, 1994;Engelhard, 1996;Myford & Wolfe, 2003, 2004Wang & Engelhard, 2019a;Wind & Engelhard, 2012;Wolfe, 2004;Wolfe & McVay, 2012).…”

Section: A Lens Model For Judgment Process In Creativity Assessmentsmentioning

confidence: 99%

Reexamining subjective creativity assessments in science tasks: An application of the rater-mediated assessment framework and many-facet Rasch model.

Wang¹,

Long²

2024

Psychology of Aesthetics, Creativity, and the Arts

Self Cite

View full text Add to dashboard Cite

Subjective creativity assessments that are originally developed as the Consensual Assessment Technique (CAT) rely on human raters to score creativity of the products. A few controversial issues in this approach are related to tasks, subjects, raters’ qualifications and performance, and methods for analyzing rating scores. This study addressed these issues under the theoretical framework of rater-mediated assessment and Rasch measurement theory. Data were collected from three groups of raters with different levels of expertise who assessed creativity of 24 6th graders in gifted and nongifted classes in two science tasks. Results indicated that (a) nongifted students showed higher creativity than gifted students in two science tasks, (b) three groups of raters had comparable scoring performances of creativity in two tasks, (c) only one rater exhibited differential rater functioning between nongifted and gifted students, but a few raters showed differential scoring between two science tasks. Implications of using rater-mediated assessment and Many-Facet Rasch measurement model to understand subjective creativity assessments are discussed, and future studies are suggested.

show abstract

Conceptualizing Rater Judgments and Rating Processes for Rater‐Mediated Assessments

Cited by 11 publications

References 52 publications

Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments

Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments

A Comprehensive Simulation Study of Estimation Methods for the Rasch Model

Reexamining subjective creativity assessments in science tasks: An application of the rater-mediated assessment framework and many-facet Rasch model.

Contact Info

Product

Resources

About