The Common European Framework of Reference (CEFR; Council of Europe, 2001) provides a competency model that is increasingly used as a point of reference to compare language examinations. Nevertheless, aligning examinations to the CEFR proficiency levels remains a challenge. In this article, we propose a new, level-centered approach to designing and aligning writing tasks in line with the CEFR levels. Much work has been done on assessing writing via tasks spanning over several levels of proficiency but little research on a level-specific approach, where one task targets one specific proficiency level. In our study, situated in a large-scale assessment project where such a level-specific approach was employed, we investigate the influence of the design factors tasks, assessment criteria, raters, and student proficiency on the variability of ratings, using descriptive statistics, generalizability theory, and multifaceted Rasch modeling. Results show that the level-specific approach yields plausible inferences about task difficulty, rater harshness, rating criteria difficulty, and student distribution. Moreover, Rasch analyses show a high level of consistency between a priori task classifications in terms of CEFR levels and empirical task difficulty estimates. This allows for a testcentered approach to standard setting by suggesting empirically grounded cut-scores in line with the CEFR proficiency levels targeted by the tasks.
INTRODUCTION
Since its publication in 2001, the Common European Framework of Reference (CEFR; Council of Europe 2001) has increasingly become a key reference document for language test developersCorrespondence should be sent to Claudia Harsch, University of Warwick, Centre for Applied Linguistics, Coventry, CV4 7AL United Kingdom. E-mail: C.Harsch@warwick.ac.uk 2 HARSCH AND RUPP who seek to gain widespread acceptance for their tests within Europe. The CEFR represents a synthesis of key aspects about second and foreign language learning, teaching, and assessment. It primarily serves as a consciousness-raising device for anyone working in these areas and as an instrument for the self-assessment of language ability via calibrated scales. In other words, it is not a how-to guide for developing language tests even though it can serve as a basis for such endeavors (see, e.g., Alderson & Huhta, 2005, or North, 2004. As a result, many test developers are unsure about how to use the information in the CEFR to design tests that are aligned with the CEFR, both in philosophy and practice. This article is situated within this broader context and aims to provide insight into the question how writing tasks can be aligned with the CEFR levels.Of importance, writing ability is usually measured by using open tasks that can elicit a range of written responses. These, in turn, are generally scored by trained raters using a rating scale that covers several bands or levels of proficiency; we call this approach a multilevel approach. However, if one needs to determine whether a student has reached one specific level...