2024
DOI: 10.1073/pnas.2307876121
|View full text |Cite
|
Sign up to set email alerts
|

Large-scale evidence for logarithmic effects of word predictability on reading time

Cory Shain,
Clara Meister,
Tiago Pimentel
et al.

Abstract: During real-time language comprehension, our minds rapidly decode complex meanings from sequences of words. The difficulty of doing so is known to be related to words’ contextual predictability, but what cognitive processes do these predictability effects reflect? In one view, predictability effects reflect facilitation due to anticipatory processing of words that are predictable from context. This view predicts a linear effect of predictability on processing demand. In another view, predictability effects ref… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
3

Year Published

2024
2024
2025
2025

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 100 publications
1
8
3
Order By: Relevance
“…Our findings suggest that in subtyping aphasia, machine learning models involving larger scale LLMs underperform those involving smaller LLMs (Table 4 ). This result also indirectly aligns with Oh and Schuler 81 and Shain et al 38 . They found that larger scale LLMs show a worse fit to human reading times.…”
Section: Discussionsupporting
confidence: 90%
“…Our findings suggest that in subtyping aphasia, machine learning models involving larger scale LLMs underperform those involving smaller LLMs (Table 4 ). This result also indirectly aligns with Oh and Schuler 81 and Shain et al 38 . They found that larger scale LLMs show a worse fit to human reading times.…”
Section: Discussionsupporting
confidence: 90%
“…This is precisely the probability distribution we obtain from a language model after it has been given w 0 , …, w i −1 as input. The relationship between human reading times and surprisal estimated from a language model in this fashion has been found to be approximately linear (Shain et al, 2024 ; Smith & Levy, 2013 ).…”
Section: Methodsmentioning
confidence: 99%
“…Frequency values were extracted from the SUBTLEX corpus of American film subtitles (Brysbaert & New, 2009), commonly used as a proxy for standard-English word frequency and which has been shown to correlate with reading-time behavior. Because the impact of word frequency on reading times is logarithmic as opposed to linear (Shain et al, 2024), we used the Zipf values (which are both logarithmic and standardized) as opposed to raw counts (Van Heuven et al, 2014). 3 To avoid including noncontent words, we limited our analysis of frequency to the words in our corpora marked as a verb, noun, adjective, or adverb according to Stanza.…”
Section: Word Frequencymentioning
confidence: 99%