2001
DOI: 10.1111/j.1745-3984.2001.tb01119.x
|View full text |Cite
|
Sign up to set email alerts
|

Real‐Time Feedback on Rater Drift in Constructed‐Response Items: An Example From the Golden State Examination

Abstract: In this study, patterns of variation in severities of a group of raters over time or so‐called “rater drift” was examined when raters scored an essay written under examination conditions. At the same time feedback was given to rater leaders (called “table leaders”) who then interpreted the feedback and reported to the raters. Rater severities in five successive periods were estimated using a modified linear logistic test model (LLTM, Fischer, 1973) approach. It was found that the raters did indeed drift toward… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
44
0
1

Year Published

2002
2002
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 44 publications
(46 citation statements)
references
References 21 publications
1
44
0
1
Order By: Relevance
“…First, only moderate stability of rater effects (r=0.6) was found across the two monitoring systems, somewhat worryingly suggesting that different impressions of rater performance could be given by the adoption of a particular system. Other studies too have shown instability in rater effects (Baird et al, 2013;Congdon & McQueen, 2000;Hoskens & Wilson, 2001;Harik et al, 2009;Lamprianou, 2006;Myford & Wolfe, 2009), which might be explained by small sample sizes in the monitoring checks.…”
Section: Discussionmentioning
confidence: 95%
See 2 more Smart Citations
“…First, only moderate stability of rater effects (r=0.6) was found across the two monitoring systems, somewhat worryingly suggesting that different impressions of rater performance could be given by the adoption of a particular system. Other studies too have shown instability in rater effects (Baird et al, 2013;Congdon & McQueen, 2000;Hoskens & Wilson, 2001;Harik et al, 2009;Lamprianou, 2006;Myford & Wolfe, 2009), which might be explained by small sample sizes in the monitoring checks.…”
Section: Discussionmentioning
confidence: 95%
“…In a study of quality checks on a mathematics high school examination, Wilson & Case (2000) found significant supervisor group training ('table leader') effects, but could not establish an association with the subsequent behaviour of raters within teams. Hoskens & Wilson (2001) also found supervisor effects in their analysis of rating quality checks for a high school Economics examination, with one team of raters being significantly more severe.…”
Section: Introductionmentioning
confidence: 94%
See 1 more Smart Citation
“…For example, it would be of basic interest to determine if detection was invariant for a given set of observers over repeated sessions; similar values of d j across sessions would provide evidence that d j measures a stable characteristic. This is analogous to research in item response theory that has studied the invariance of item parameters across different groups of examinees (see Hambleton & Swaminathan, 1985;Lord, 1980) or over time (e.g., Hoskens & Wilson, 2001), and to research in confirmatory factor analysis that has studied the invariance of factor loadings across groups or time (e.g., Alwin & Jackson, 1981;Byrne, Shavelson, & Muthén, 1989). Note that, from the view via SDT, one would expect that discrimination might be invariant, but not the criteria (so there is only a partial invariance); also note that conclusions with respect to the criteria can differ, depending on which criteria measure is used, as discussed previously.…”
Section: Validationmentioning
confidence: 99%
“…For example, the reliability of scores on the communication portion of the United States Medical Licensing Examination (USMLE Ò ) Clinical Skills Examination increased from 0.77 to 0.84 when an OLS model was used to adjust scores for differences in case-SP leniency (Harik et al 2009), which is roughly equivalent to increasing the length of the examination by 50%. A potential limitation of statistical adjustment is that a rater's leniency may fluctuate or drift over time (McKinley and Boulet 2004;Hoskens and Wilson 2001). However, prior research suggests that leniency indices for communication skills are fairly stable for 2-3 months (Harik et al 2009).…”
Section: Introductionmentioning
confidence: 97%