“…Indeed, previous co-occurrence-based models often benefited from the removal of function words and other high-frequency “stop words” for solving various NLP tasks (e.g., Bernardi et al, 2013 ; Herbelot & Baroni, 2017 ; Lazaridou et al, 2017 ). Even for state-of-the-art ANN language models, it has recently been shown that retaining only the content words in linguistic context has little effect on next-word prediction performance, with performance varying as a function of how much of the lexical content is included (i.e., higher performance when keeping all content words vs. keeping only subsets, such as keeping only the nouns and verbs, or keeping only the nouns; O’Connor & Andreas, 2021 ). Whereas function words have been shown to have a sizable effect on ANN next-word prediction performance within local sentence contexts (because they help ensure grammaticality; Khandelwal et al, 2018 ), content words strongly influence prediction performance both within local and more extended contexts ( Khandelwal et al, 2018 ; O’Connor & Andreas, 2021 ).…”