Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics 2021
DOI: 10.18653/v1/2021.cmcl-1.2
|View full text |Cite
|
Sign up to set email alerts
|

Human Sentence Processing: Recurrence or Attention?

Abstract: Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks but little is known about its ability to model human language processing. We compare Transformer-and RNNbased language models' ability to account for measures of human reading effort. Our analysis shows Transformers to outperform RNNs in explaining self-paced reading times an… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

3
61
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 67 publications
(65 citation statements)
references
References 35 publications
3
61
1
Order By: Relevance
“…They find that as long as the three types of models achieve a similar level of language modeling performance, there is no reliable difference in their predictive power. Merkx and Frank (2021) extend this study by comparing Transformer models against GRU models following similar experimental methods. The Transformer models are found to outperform the GRU models on explaining self-paced reading times and N400 measures but not eye-gaze durations.…”
Section: Related Workmentioning
confidence: 97%
“…They find that as long as the three types of models achieve a similar level of language modeling performance, there is no reliable difference in their predictive power. Merkx and Frank (2021) extend this study by comparing Transformer models against GRU models following similar experimental methods. The Transformer models are found to outperform the GRU models on explaining self-paced reading times and N400 measures but not eye-gaze durations.…”
Section: Related Workmentioning
confidence: 97%
“…Common optimization tasks for pretraining transformers, such as the masked LM task (Devlin et al, 2018) are quite similar to the word prediction tasks that are known to predict children's performance on other linguistic skills (Borovsky et al, 2012;Neuman et al, 2011;Gambi et al, 2020). Finally, TLMs tend to outperform other LMs in recent work modeling human reading times, eye-tracking data, and other psychological and psycholinguistic phenomena (Merkx and Frank, 2021;Schrimpf et al, 2020b,a;Hao et al, 2020;Bhatia and Richie, 2020;.…”
Section: Related Workmentioning
confidence: 98%
“…Transformer architecture (Vaswani et al, 2017) has advanced the state of the art in a wide range of natural language processing (NLP) tasks (Devlin et al, 2019;Liu et al, 2019;. Along with this, Transformers have become a major subject of research from the viewpoints of engineering (Rogers et al, 2020) and scientific studies (Merkx and Frank, 2021;Manning et al, 2020).…”
Section: Introductionmentioning
confidence: 99%