K athleen Blake Yancey (1999) has described the history of writing assessment as a tug-of-war between the competing goals of validity and reliability. For the first forty years of the twentieth century, validity reigned; the College Board entrance examinations were developed and read by joint committees of college and high school teachers. Student essays demonstrated complex understandings of Shakespeare and Milton, reflecting the highest aims of a rich curriculum. The scoring of the essays was notoriously unreliable, however, with little consistency from one scorer to another (Elliot, 2003).For the next thirty years, reliability dominated writing assessment; multiple-choice tests provided highly consistent scoring. Faculty complained, however, that the tests lacked validity because students never actually had to write anything. The problem appeared to be solved in the early 1970s when the Educational Testing Service developed the technique of holistic scoring. Finally, students would do some real writing, and the holistic scores would be consistent from one rater to another. The hope was fairly shortlived. To achieve high rates of inter-rater reliability (0.80 on a scale of 0.0 to 1.0 was considered desirable), both the writing prompts and the scoring mechanisms were so highly structured and simplified that the writing tests no longer resembled the kind of writing tasks that students performed in courses. Reliability won out, and validity was sacrificed.The study reported here is an attempt to achieve both high validity and high reliability. Although the study was small, with only ten participants, the results were promising, indicating that further research with larger and different populations should be conducted.
The Research QuestionA national movement to train teachers in holistic scoring was prominent in the late 1970s, and although it was widely acclaimed, the intensive face-to-face training was timeconsuming and expensive and thus quickly dropped. Many departments of English or of writing and rhetoric still employ common assessments to great benefit, developing a shared understanding of standards and a collaborative attitude that extends beyond assessment. These efforts are often viewed as a burdensome commitment of time, however, and of course, they cannot easily be extended beyond a local venue.