Frequency is often the only variable considered when researchers or teachers develop vocabulary materials for second language (L2) learners. However, researchers have also found that many other variables affect vocabulary acquisition. In this study, we explored the relationship between L2 vocabulary acquisition and a variety of lexical characteristics using vocabulary recognition test data from L2 English learners. Conducting best subsets multiple regression analysis to explore all possible combinations of variables, we produced a best‐fitting model of vocabulary difficulty consisting of six variables (R2 = .37). The fact that many variables significantly contributed to the regression model and that a large amount of variance remained yet unexplained by the frequency variable considered in this study indicates that much more than frequency alone affects the likelihood that learners will learn certain L2 words.
This study tests three measures of lexical diversity (LD), each using five operationalizations of word types. The measures include MTLD (measure of textual lexical diversity), MTLD-W (moving average MTLD with wrap-around measurement), and MATTR (moving average type-token ratio). Each of these measures is tested with types operationalized as orthographic forms, lemmas using automated POS tags, lemmas using manually corrected POS tags, flemmas (list-based lemmas that do not distinguish between parts of speech), and word families. These measures are applied to 60 narrative texts written in English by adolescent native speakers of English (n = 13), Finnish (n = 31), and Swedish (n = 16). Each individual LD measure is evaluated in relation to how well it correlates with the mean LD ratings of 55 human raters whose inter-rater reliability was exceedingly high (Cronbach’s alpha = .980). The overall results show that the three measures are comparable but two of the operationalizations of types produce mixed results across measures.
Discourse Completion Tasks (DCTs) have been one of the most popular tools in pragmatics research. Yet, many have
criticized DCTs for their lack of authenticity (e.g., Culpeper, Mackey, & Taguchi,
2018; Nguyen, 2019). We propose that corpora can serve as resources in
designing and evaluating DCTs. We created a DCT using advice-seeking prompts from the Q+A corpus (Baker & Egbert, 2016). Then, we administered the DCT to 33 participants. We evaluated the DCT by (1)
comparing the linguistic form and the semantic content of the participants’ DCT responses (i.e., advice-giving expressions) with
authentic data from the corpus; and (2) interviewing the participants about the instrument quality. Chi-square tests between DCT
data and corpus data revealed no significant differences in advice-giving expressions in terms of both the overall level of
directness (χ2 [2, N = 660] = 6.94, p = .03, V = .10) and
linguistic realization (χ2 [8, N = 660] = 17.75, p = .02,
V = .16), and showed a significant difference but small effect size in terms of semantic content
(χ2 [6, N = 512] = 30.35, p < .01, V = .24). Taken
together with the interview data, our findings indicate that corpora are useful in designing DCTs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.