Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.76
|View full text |Cite
|
Sign up to set email alerts
|

A Targeted Assessment of Incremental Processing in Neural Language Models and Humans

Abstract: We present a targeted, scaled-up comparison of incremental processing in humans and neural language models by collecting by-word reaction time data for sixteen different syntactic test suites across a range of structural phenomena. Human reaction time data comes from a novel online experimental paradigm called the Interpolated Maze task. We compare human reaction times to by-word probabilities for four contemporary language models, with different architectures and trained on a range of data set sizes. We find … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(22 citation statements)
references
References 21 publications
3
19
0
Order By: Relevance
“…The present study argues that autoregressive models do not (uniformly) process pronouns like humans. We showed that models fail to capture the qualitative patterns of human incremental coreference processing, in addition to underestimating processing costs in constructions already noted in the literature (see van Schijndel and Linzen, 2021;Wilcox et al, 2021b). Models appear to learn only aspects of Principle B that have predictable reflexes in training data.…”
Section: Discussionmentioning
confidence: 72%
See 1 more Smart Citation
“…The present study argues that autoregressive models do not (uniformly) process pronouns like humans. We showed that models fail to capture the qualitative patterns of human incremental coreference processing, in addition to underestimating processing costs in constructions already noted in the literature (see van Schijndel and Linzen, 2021;Wilcox et al, 2021b). Models appear to learn only aspects of Principle B that have predictable reflexes in training data.…”
Section: Discussionmentioning
confidence: 72%
“…Recent work has placed increased attention on finer-grained comparisons between neural models and humans (e.g., van Schijndel and Linzen, 2021;Wilcox et al, 2021b;Paape and Vasishth, 2022). The growing consensus is that neural models underestimate the processing costs seen with humans, while nonetheless capturing the broad patterns (see Wilcox et al, 2021b).…”
Section: Introductionmentioning
confidence: 99%
“…Unlike convolutional neural networks, whose architectural design principles are roughly inspired by biological vision [Lindsay, 2021], the design of current neural network language models is largely uninformed by psycholinguistics and neuroscience. And yet, there is an ongoing effort to adopt and adapt neural network language models to serve as computational hypotheses of how humans process language, making use of a variety of different architectures, training corpora, and training tasks [e.g., Wehbe et al, 2014, Toneva and Wehbe, 2019, Heilbron et al, 2020, Jain et al, 2020, Lyu et al, 2021, Schrimpf et al, 2021, Wilcox et al, 2021, Goldstein et al, 2022, Caucheteux and King, 2022. We found that recurrent neural networks make markedly human-inconsistent predictions once pitted against transformer-based neural networks.…”
Section: Implications For Artificial Neural Network Language Models A...mentioning
confidence: 89%
“…Another recent variant of the Maze task is interpolated I-maze, which uses a mix of real word distractors (generated via the A-maze process) and non-word distractors (Vani, Wilcox, and Levy 2021;Wilcox, Vani, and Levy 2021). The presence of real word distractors encourages close attention to the sentential context, while non-words can be used as distractors where the word in the sentence is itself ungrammatical or highly unexpected, and/or it is important that the predictability of the distractor in the context is perfectly well-balanced (at zero) across all experimental conditions.…”
Section: Mazementioning
confidence: 99%