Within the fields of applied linguistics and language testing, there has been a recent interest in rating scales, and how rating scales are constructed (Upshur and Turner, 1995). This is not surprising, as there is increasing concern that scores from language tests should be meaningful in applied linguistics terms. However, applied linguistics research and second language acquisition research have done little to provide descriptions of language abilities or performances which can be operationalized by language testers. Many existing descriptors for bands in rating scales are therefore barely tenable as definitions of constructs.This article looks at the definition of fluency in the literature, and proposes a qualitative and quantitative approach which may be used to produce a 'thick' description of language use, which can be used in rating scale construction. A fluency rating scale is described, and its reliability and validity assessed. The article suggests that validity considerations must be addressed in the construction phase of developing rating scales, through the careful consideration of the linguistic meaning of constructs, rather than merely as a post hoc enterprise.
Rating scale design and development for testing speaking is generally conducted using one of two approaches: the measurement-driven approach or the performance-data driven approach. The measurement-driven approach prioritizes the ordering of descriptors onto a single scale. Meaning is derived from the scaling methodology and the agreement of trained judges as to the place of any descriptor on the scale. The performance data-driven approach, on the other hand, places primary value upon observations of language performance, and attempts to describe performance in sufficient detail to generate descriptors that bear a direct relationship with the original observations of language use. Meaning is derived from the link between performance and description. We argue that measurement-driven approaches generate impoverished descriptions of communication, while performance data-driven approaches have the potential to provide richer descriptions that offer sounder inferences from score meaning to performance in specified domains. With reference to original data and the literature on travel service encounters, we devise a new scoring instrument, a Performance Decision Tree (PDT). This instrument prioritizes what we term 'performance effect' by explicitly valuing and incorporating performance data from a specific communicative context. We argue that this avoids the reification of ordered scale descriptors which we find in measurement-driven scale construction for speaking tests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.