This book enables practitioners to apply statistics effectively to the development and use of language assessments. The Workbook and CD contain datasets from actual language assessments and data analysis exercises.
The complexities of task-based language performance assessment (TBLPA ) are leading language testers to reconsider many of the fundamental issues about what we want to assess, how we go about it and what sorts of evidence we need to provide in order to justify the ways in which we use our assessments. One claim of TBLPA is that such assessments can be used to make predictions about performance on future language use tasks outside the test itself. I argue that there are several problems with supporting such predictions. These problems are related to task selection, generalizability and extrapolation. Because of the complexity and diversity of tasks in most 'real-life' domains, the evidence of content relevance and representativeness that is required to support the use of test scores for prediction is extremely dif cult to provide.A more general problem is the way in which dif culty is conceptualized, both in the way tasks are described and in current measurement models. The conceptualization of 'dif culty features' confounds task characteristics with test-takers' language ability and introduces a hypothetical 'dif culty' factor as a determinant of test performance. In current measurement models, 'dif culty' is essentially an artifact of test performance, and not a characteristic of assessment tasks themselves. Because of these problems, current approaches to using task characteristics alone to predict dif culty are unlikely to yield consistent or meaningful results. As a way forward, a number of suggestions are provided for both language testing research and practice.
Although research on the cloze test has offered differing evidence regarding what language abilities it measures, there is a general consensus among researchers that not all the deletions in a given cloze passage measure exactly the same abilities. An important issue for test developers, therefore, is the extent to which it is possible to design cloze tests that measure specific abilities. Two cloze tests were prepared from the same text. In one, different types of deletions were made according to the range of context required for closure, while in the other a fixed-ratio deletion procedure was followed. These tests were administered to 910 university and pre-university students, including both native and non-native speakers of English, with approximately half assigned at random to take the fixed-ratio test and the other half taking the rationally deleted test. While both tests were equally reliable and had equal criterion validity, the fixed-ratio test was significantly more difficult. Analyses of responses to different types of deletions suggest that the difficulty of cloze items is a function of the range of syntactic and discourse context required for closure. The study also provides practical and empirically supported criteria for making rational deletions and suggests that cloze tests can be designed to measure a range of abilities.Recent research on the cloze test has focused on the question of construct validity; more specifically, it has examined the extent to which cloze deletions are capable of measuring language abilities beyond the knowledge of sentence-level grammatical structure. While there are considerable differences among the studies which have examined this question, there appears to be some consensus among researchers that not all deletions in a given cloze passage measure exactly the same abilities. If this is so, then an important issue for test developers is the extent to which it is possible to control,
The notion of communicative competence has received wide attention in the past few years, and numerous attempts have been made to define it. Canale and Swain (1980) have reviewed these attempts and have developed a framework which defines several hypothesized components of communicative competence and makes the implicit claim that tests of components of communicative competence measure different abilities. In this study we examine the construct validity of some tests of components communicative competence and of a hypothesized model. Three distinct traits—linguistic competence, pragmatic competence and sociolinguistic competence—were posited as components of communicative competence. A multitrait‐multimethod design was used, in which each of the three hypothesized traits was tested using four methods: an oral interview, a writing sample, a multiple‐choice test and a self‐rating. The subjects were 116 adult non‐native speakers of English from various language and language‐learning backgrounds. Confirmatory factor analysis was used to examine the plausibility of several causal models, involving from one to three trait factors. The results indicate that the model which best fits the data includes a general and two specific trait factors —grammatical/pragmatic competence and sociolinguistic competence. The relative importance of the trait and method factors in the various tests used is also indicated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.