Features of difficult-to-score essays

Wolfe, Edward W.; Song, Tian; Jiao, Hong

doi:10.1016/j.asw.2015.06.002

Cited by 47 publications

(34 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This finding supports the findings of previous research such as those of Raczynski, Cohen, Engelhard, and Lu (2015), which demonstrated that some essays are significantly more difficult to be scored accurately than other essays for professional raters like professors and reviewers. This finding is also in line with what Wolfe, Song, and Jiao (2016) found as features of difficult-to-score essays for professional raters.…”

Section: A Comparison Of Teachers' and Students' Perceptions Of L2 Wrsupporting

confidence: 91%

Examining the Possible Effects of (mis)matches between EFL Teachers’ and Students’ Perceptions of L2 Writing Assessment on Students’ Writing Achievement Scores

Tajgozari

Alimorad

2019

KJHSS

View full text Add to dashboard Cite

The present mixed-methods study intended to explore Iranian EFL teachers’ and students’ perceptions of assessment of students’ written performance and the effect of any possible (mis)matches on students' achievement. To these aims, a convenient sample of teachers (N=5) and students (N=30) from different classes and institutes in Iran was recruited to participate in the study. In the first phase of the study, all of the participants, both teachers and students, were interviewed to determine their perceptions of writing assessment. In the next phase, students were asked to write about a topic and in the last phase, the writings were assessed and scored by both teachers and students. During this phase, teachers and students were asked to think aloud while assessing the writings. Results indicated that a) based on teachers’ perceptions, language, punctuation, content, organization, and communicative achievement are important factors in assessing a piece of writing, respectively. Also, teachers believed composition writing is the best activity to assess students' writing and the teachers should stick to their own perceptions while assessing writing and not use available rubrics. Moreover, all of them believed that scoring writing is always inaccurate and subjective; b) based on students’ perceptions, grammar and spelling are important factors in the assessment of a piece of writing, respectively. They also mentioned that the ability to write can be assessed through composition writing and their teachers should utilize their own perceptions rather than utilizing a standard rubric. In addition, most of them considered scoring writing to be always inaccurate and subjective; c) students' perceptions of writing assessment affected their writing scores. That is, based on their own perceptions of criteria for writing assessment, the students expected to receive higher scores than the ones given by their teachers.

show abstract

Section: A Comparison Of Teachers' and Students' Perceptions Of L2 Wrsupporting

confidence: 91%

Examining the Possible Effects of (mis)matches between EFL Teachers’ and Students’ Perceptions of L2 Writing Assessment on Students’ Writing Achievement Scores

Tajgozari

Alimorad

2019

KJHSS

View full text Add to dashboard Cite

show abstract

“…Furthermore, results from these analyses do not provide substantive information regarding the implications of DRF, potential causes of DRF, or specific issues that individuals who train raters need to address during rater training in order to improve the fairness of an assessment. In order to understand better the substantive implications of DRF and identify potential issues to address in rater training, practitioners might consider using qualitative approaches such as cognitive interviews (Wang, Engelhard, Raczynski, Song, & Wolfe, 2017;Wolfe, Kao, & Ranney, 1998) and textual analyses of student compositions (Wind, Stager, & Patil, 2017;Wolfe, Song, & Jiao, 2016) in conjunction with the quantitative results.…”

Section: Discussionmentioning

confidence: 99%

Examining Differential Rater Functioning Using a Between‐Subgroup Outfit Approach

Wind

Sebok‐Syer²

2019

J Educational Measurement

View full text Add to dashboard Cite

When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of standardized residuals). One can create plots of the standardized residuals, isolating those that resulted from raters' ratings of particular subgroups. Practitioners can then examine the plots to identify raters who did not maintain a uniform level of severity when they assessed various subgroups (i.e., exhibited evidence of differential rater functioning). In this study, we analyzed simulated and real data to explore the utility of this between-subgroup fit approach. We used standardized between-subgroup outfit statistics to identify misfitting raters and the corresponding plots of their standardized residuals to determine whether there were any identifiable patterns in each rater's misfitting ratings related to subgroups.

show abstract

“…To model cumulative response processes, the many‐facet Rasch model (Linacre, ) is proposed by Engelhard () to examine rater accuracy. This model is also named the rater accuracy model (RAM; Wolfe, Song, & Jiao, ), because it provides rater accuracy estimates on a latent continuum. The RAM can be specified as follows:

\ln (\frac{π_{ij, k}}{π_{ij, (k - 1)}}) = δ_{i} - λ_{j} - τ_{k},

where

π_{ij, k}

= probability of receiving an accuracy rating k on essay i for rater j ;

π_{ij, false(k - 1 false)}

= probability of receiving an accuracy rating k – 1 on essay i for rater j ;

δ_{i}

= difficulty of essay i to be scored accurately;

λ_{j}

= accuracy of rater j ;

τ_{k}

= difficulty of reaching category k relative to category k – 1 of accuracy ratings. …”

Section: Evaluating Rater Accuracymentioning

confidence: 99%

“…To model cumulative response processes, the many-facet Rasch model (Linacre, 1989) is proposed by Engelhard (1996) to examine rater accuracy. This model is also named the rater accuracy model (RAM; Wolfe, Song, & Jiao, 2016), because it provides rater accuracy estimates on a latent continuum. The RAM can be specified as follows:…”

Section: Evaluating Rater Accuracymentioning

confidence: 99%

Conceptualizing Rater Judgments and Rating Processes for Rater‐Mediated Assessments

Wang

Engelhard²

2019

J Educational Measurement

View full text Add to dashboard Cite

Rater‐mediated assessments exhibit scoring challenges due to the involvement of human raters. The quality of human ratings largely determines the reliability, validity, and fairness of the assessment process. Our research recommends that the evaluation of ratings should be based on two aspects: a theoretical model of human judgment and an appropriate measurement model for evaluating these judgments. In rater‐mediated assessments, the underlying constructs and response processes may require the use of different rater judgment models and the application of different measurement models. We describe the use of Brunswik's lens model as an organizing theme for conceptualizing human judgments in rater‐mediated assessments. The constructs vary depending on which distal variables are identified in the lens models for the underlying rater‐mediated assessment. For example, one lens model can be developed to emphasize the measurement of student proficiency, while another lens model can stress the evaluation of rater accuracy. Next, we describe two measurement models that reflect different response processes (cumulative and unfolding) from raters: Rasch and hyperbolic cosine models. Future directions for the development and evaluation of rater‐mediated assessments are suggested.

show abstract

Features of difficult-to-score essays

Cited by 47 publications

References 27 publications

Examining the Possible Effects of (mis)matches between EFL Teachers’ and Students’ Perceptions of L2 Writing Assessment on Students’ Writing Achievement Scores

Examining the Possible Effects of (mis)matches between EFL Teachers’ and Students’ Perceptions of L2 Writing Assessment on Students’ Writing Achievement Scores

Examining Differential Rater Functioning Using a Between‐Subgroup Outfit Approach

Conceptualizing Rater Judgments and Rating Processes for Rater‐Mediated Assessments

Contact Info

Product

Resources

About