Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require adjudication by at third rater. However, from an assessment validation standpoint, questions remain about the impact of negotiation on the scoring inference of a validation argument (Kane, 2006(Kane, , 2012. Thus, this mixed-methods study evaluates the impact of score negotiation on scoring consistency in second language writing assessment, as well as negotiation's potential contributions to raters' understanding of test constructs and the local curriculum. Many-faceted Rasch measurement (MFRM) was used to analyze scores (n = 524) from the writing section an EAP placement exam and to quantify how negotiation affected rater severity, selfconsistency, and bias toward individual categories and test takers. Semi-structured interviews with raters (n = 3) documented their perspectives about how negotiation affects scoring and teaching. In this study, negotiation did not change rater severity, though it greatly reduced measures of rater bias. Furthermore, rater comments indicated that negotiation supports a nuanced understanding of the rubric categories and increases positive washback on teaching practices.
Cloze tests have been the subject of numerous studies regarding their function and use in both first language and second language contexts (e.g., Jonz & Oller, 1994; Watanabe & Koyama, 2008). From a validity standpoint, one area of investigation has been the extent to which cloze tests measure reading ability beyond the sentence level. Using test data from 50 30-item cloze passages administered to 2,298 Japanese and 5,170 Russian EFL students, this study examined the degree to which linguistic features for cloze passages and items influenced item difficulty. Using a common set of 10 anchor items, all 50 tests were modeled in terms of person ability and item difficulty onto a single scale using many-faceted Rasch measurement (k = 1314). Principle components analysis was then used to categorize 25 linguistic item-and passage-level variables for the 50 cloze tests and their respective items, from which three components for each passage-and itemlevel variables were identified. These six factors along with item difficulty were then entered into both a hierarchical structural equation model and a linear multiple regression to determine
Originally designed to measure reading and passage comprehension in L1 readers, cloze tests continue to be used for L2 assessment purposes. However, there remain disputes about whether or not cloze items can measure beyond local comprehension information, as well as whether or not they are purely a test of reading alone, or if performance can be generalized to broader claims about proficiency. The current study sets out to address both of these issues by drawing on a large pool of cloze items ( k = 449) taken from 15 cloze passages that were administered to 675 L1 and 2246 L2 examinees. In conjunction with test scores, a large-scale L1 experiment was conducted using Amazon’s Mechanical Turk to determine the level of minimum context required to answer each item. Using Rasch analysis, item function was compared across both groups, with results indicating that cloze items can draw on information at both the sentence and passage level. This seems to suggest further that cloze tests generally tend to measure reading in both L1 and L2 examinees. These findings have important implications for the continued use of cloze tests, particularly in classroom and high-stakes contexts where they are commonly found.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.