This study explores the construct of lexical sophistication and its applications for measuring second language lexical and speaking proficiency. In doing so, the study introduces the Tool for the Automatic Analysis of LExical Sophistication (TAALES), which calculates text scores for 135 classic and newly developed lexical indices related to word frequency, range, bigram and trigram frequency, academic language, and psycholinguistic word information. TAALES is freely available; runs on Windows, Mac, and Linux operating systems; and has a simple graphic user interface that allows for batch processing of .txt files. The tool is fast, reliable, and outputs results to a comma‐separated value file that can be accessed using spreadsheet software. The study examines the ability of TAALES indices to explain the variance in human judgments of lexical proficiency and speaking proficiency for second language (L2) learners. Overall, these indices were able to explain 47.5% of the variance in holistic scores of lexical proficiency and 48.7% of the variance in holistic scores of speaking proficiency. This study has important implications for second language acquisition, for assessing L2 learners' productive skills (writing and speaking), and for L2 pedagogy. Limitations and future directions are also discussed.
In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.
Many programs designed to compute the readability of texts are narrowly based on surface‐level linguistic features and take too little account of the processes which a reader brings to the text. This study is an exploratory examination of the use of Coh‐Metrix, a computational tool that measures cohesion and text difficulty at various levels of language, discourse, and conceptual analysis. It is suggested that Coh‐Metrix provides an improved means of measuring English text readability for second language (L2) readers, not least because three Coh‐Metrix variables, one employing lexical coreferentiality, one measuring syntactic sentence similarity, and one measuring word frequency, have correlates in psycholinguistic theory. The current study draws on the validation exercise conducted by Greenfield (1999) with Japanese EFL students, which partially replicated Bormuth's (1971) study with American students. It finds that Coh‐Metrix, with its inclusion of the three variables, yields a more accurate prediction of reading difficulty than traditional readability measures. The finding indicates that linguistic variables related to cognitive reading processes contribute significantly to better readability prediction than the surface variables used in traditional formulas. Additionally, because these Coh‐Metrix variables better reflect psycholinguistic factors in reading comprehension such as decoding, syntactic parsing, and meaning construction, the formula appears to be more soundly based and avoids criticism on the grounds of construct validity.
Spoken language data were collected from six adult second language (L2) English learners over a year-long period in order to explore the development of word polysemy and frequency use. The data were analyzed both quantitatively and qualitatively. In the first analysis, the growth of WordNet polysemy values and CELEX word frequency values were examined. For both indexes, significant growth was demonstrated from the 2nd to the 16th week of observation, after which values remained stable. Growth in word polysemy values also correlated with changes in word frequency, supporting the notion that frequency and polysemy effects in word use are related. A second analysis used the WordNet dictionary to explore qualitative changes in word sense use concerning six frequent lexical items in the learner corpus (think, know, place, work, play, and name). A qualitative analysis compared normalized frequencies for each word sense in the first trimester of the study to the later trimesters. Differences in the number of word senses used across trimesters were found for all six words. Analyses 1 and 2, taken together, Crossley, Salsbury, and McNamara Polysemy and Frequency Use in English L2 Speakers support the notion that L2 learners begin to use words that have the potential for more senses during the first 4 months; learners then begin to extend the core meanings of these polysemous words. These findings provide further insights into the development of lexical proficiency in L2 learners and the growth of lexical networks.
This study introduces the second release of the Tool for the Automatic Analysis of Lexical Sophistication (TAALES 2.0), a freely available and easy-to-use text analysis tool. TAALES 2.0 is housed on a user's hard drive (allowing for secure data processing) and is available on most operating systems (Windows, Mac, and Linux). TAALES 2.0 adds 316 indices to the original tool. These indices are related to word frequency, word range, n-gram frequency, n-gram range, n-gram strength of association, contextual distinctiveness, word recognition norms, semantic network, and word neighbors. In this study, we validated TAALES 2.0 by investigating whether its indices could be used to model both holistic scores of lexical proficiency in free writes and word choice scores in narrative essays. The results indicated that the TAALES 2.0 indices could be used to explain 58% of the variance in lexical proficiency scores and 32% of the variance in word-choice scores. Newly added TAALES 2.0 indices, including those related to n-gram association strength, word neighborhood, and word recognition norms, featured heavily in these predictor models, suggesting that TAALES 2.0 represents a substantial upgrade.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.