1990
DOI: 10.1177/016327879001300405
|View full text |Cite
|
Sign up to set email alerts
|

Judge Consistency and Severity Across Grading Periods

Abstract: The purpose of this research project was to confirm that differences in the severity of judges and the stringency of grading periods occur, regardless of the nature of the assessment or the examination materials used. Three rather different examinations that require judges were analyzed, using an extended Rasch model to determine whether differences in judge severity and grading-period stringency were observable for all three examinations. Significant variation in judge severity and some variation across gradi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
50
0
3

Year Published

1995
1995
2017
2017

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 83 publications
(57 citation statements)
references
References 6 publications
4
50
0
3
Order By: Relevance
“…Some researchers contend that the level of severity a rater exercises is a relatively stable effect that changes little over time and is not modifiable by training (Bernardin and Pence, 1980;Lunz and Stahl, 1990;Lunz, Stahl, and Wright, 1996;O'Neill and Lunz, 1996;O'Neill and Lunz, 2000;Raymond, Webb, and Houston, 1991). By contrast, other researchers argue that some raters' levels of severity can shift substantially from reading to reading (Lumley and McNamara, 1993;Myford, Marr, and Linacre, 1996), from essay topic to essay topic (Bridgeman, Morgan, and Wang, 1996;Weigle, 1999), and from day to day within the same reading (Bleistein and Maneckshana, 1995;Braun, 1988;Coffman and Kurfman, 1968;Morgan, 1998;Wilson and Case, 2000;Wood and Wilson, 1974).…”
Section: Variation In Rater Severitymentioning
confidence: 99%
“…Some researchers contend that the level of severity a rater exercises is a relatively stable effect that changes little over time and is not modifiable by training (Bernardin and Pence, 1980;Lunz and Stahl, 1990;Lunz, Stahl, and Wright, 1996;O'Neill and Lunz, 1996;O'Neill and Lunz, 2000;Raymond, Webb, and Houston, 1991). By contrast, other researchers argue that some raters' levels of severity can shift substantially from reading to reading (Lumley and McNamara, 1993;Myford, Marr, and Linacre, 1996), from essay topic to essay topic (Bridgeman, Morgan, and Wang, 1996;Weigle, 1999), and from day to day within the same reading (Bleistein and Maneckshana, 1995;Braun, 1988;Coffman and Kurfman, 1968;Morgan, 1998;Wilson and Case, 2000;Wood and Wilson, 1974).…”
Section: Variation In Rater Severitymentioning
confidence: 99%
“…Over the last several years, a number of performance assessment programs interested in examining and understanding sources of variability in their assessment systems have been experimenting with Linacre's (1999a) Facets computer program as a monitoring tool (see, for example, Heller, Sheingold, & Myford, 1998;Linacre, Engelhard, Tatum, & Myford, 1994;Lunz & Stahl, 1990;Myford & Mislevy, 1994;Paulukonis, Myford, & Heller, in press). In this study, we build on the pioneering efforts of researchers who are employing many-facet Rasch measurement to answer questions about complex rating systems for evaluating speaking and writing.…”
Section: Review Of the Literaturementioning
confidence: 99%
“…The sixth column (z-scores, or standardized fit statistics) shows the test version rater bias estimate at this phase. Bias is the difference between expected and observed ratings of the obtained data, which is then divided by its standard error to derive the z-score (Lunz & Stahl, 1990). The most preferable z value is 0, which indicates that the data match the expected model, and thus, no rater bias.…”
Section: Resultsmentioning
confidence: 99%
“…Considerable evidence of poor rater consistency has been reported in some research (e.g., Lunz & Stahl, 1990;Trace, Janssen, & Meier, 2017), and even if adequate consistency might have been reported in most research, it is mostly on the basis of correlations alone. That is, even a perfect correlation might ignore systematic variations among raters.…”
Section: Rater Behavior In Oral Performance Assessmentmentioning
confidence: 99%