“…In previous studies of rater-mediated assessments, researchers have proposed numerous indicators of rating quality that reflect various perspectives on what constitutes evidence of high-quality ratings, such as indicators of rater consistency (i.e., agreement or reliability) or fit to a measurement model (e.g., Meadows & Billington, 2005;Myford & Wolfe, 2003, 2004. As part of evaluating the fairness of ratermediated assessments, many researchers have studied differential rater functioning (DRF), or raters' tendency to apply inconsistent levels of severity when they assess students in different subgroups (Engelhard, 2008;Goodwin, 2016;Kondo-Brown, 2002;Springer & Bradley, 2018;Wesolowski, Wind, & Engelhard, 2015;Winke, Gass, & Myford, 2012). When raters exhibit DRF, they may systematically underestimate or overestimate student achievement, depending on students' membership within a subgroup.…”