2019
DOI: 10.1016/j.asw.2018.12.002
|View full text |Cite
|
Sign up to set email alerts
|

Exploring the correspondence between traditional score resolution methods and person fit indices in rater-mediated writing assessments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

2
20
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(22 citation statements)
references
References 16 publications
2
20
0
Order By: Relevance
“…However, as mentioned earlier, rater effects tend to be persistent even with extensive training. Moreover, even when assessment systems incorporate procedures such as double scoring and score resolution, rater effects and person misfit can still be present in the final ratings (e.g., Wind & Walker, 2019). As researchers have documented in previous studies, evidence that certain raters are exhibiting rater effects implies that the ratings that those raters provide should be interpreted with caution.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, as mentioned earlier, rater effects tend to be persistent even with extensive training. Moreover, even when assessment systems incorporate procedures such as double scoring and score resolution, rater effects and person misfit can still be present in the final ratings (e.g., Wind & Walker, 2019). As researchers have documented in previous studies, evidence that certain raters are exhibiting rater effects implies that the ratings that those raters provide should be interpreted with caution.…”
Section: Discussionmentioning
confidence: 99%
“…In many cases, particularly where high stakes are associated with assessment results, it may be appropriate for an additional rater to rescore student performances that are flagged for misfit, to recommend that alternative assessments be used to evaluate those students, or both. For example, these analyses could be used as supplements to common quality control procedures, such as rater agreement analyses that are used to identify students whose performances warrant rescoring (for a discussion of the alignment between person fit and rater agreement, please see Myford & Wolfe, 2002 and Wind & Walker, 2019). Regardless, the most important takeaway from such findings is that if person fit indices suggest that achievement estimates are not appropriate, those estimates should not be used in the same way as estimates for students who do not display person misfit.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We also simulated three different rating designs to reflect different procedures for establishing connectivity in sparse rating designs that researchers have reported in previous real-data studies of rater effects, as we discussed in the introduction section of this article and illustrated in Figure 1. Specifically, we simulated an overlapping performances design, as reported in studies such as Barkaoui (2011) and Wind and Walker (2019). Second, we simulated an MC item link design similar to the design reported by Wind (2013, 2018) and in many large-scale mixed-format assessments (e.g., National Center for Education Statistics, n.d.).…”
Section: Methodsmentioning
confidence: 99%
“…In theory, more than two raters could score each performance as is possible given available resources. Examples of similar overlapping performances designs in real data analyses of rater effects have been published in studies by Barkaoui (2011) and Wind and Walker (2019).…”
mentioning
confidence: 97%