Neural Language Models Capture Some, But Not All, Agreement Attraction Effects

Arehalli, Suhas; Linzen, Tal

doi:10.31234/osf.io/97qcg

Cited by 13 publications

(16 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Yet, a major gap remains between humans and these algorithms: current language models are still poor at story generation and summarization as well as dialogue and question answering (10)(11)(12)(13)(14); they fail to capture many syntactic constructs and semantics properties (15)(16)(17)(18)(19), and their linguistic understanding is often superficial (16,(18)(19)(20).…”

mentioning

confidence: 99%

Long-range and hierarchical language predictions in brains and algorithms

Caucheteux¹,

Gramfort²,

J³

2021

Preprint

View full text Add to dashboard Cite

Deep learning has recently made remarkable progress in natural language processing. Yet, the resulting algorithms remain far from competing with the language abilities of the human brain. Predictive coding theory offers a potential explanation to this discrepancy: while deep language algorithms are optimized to predict adjacent words, the human brain would be tuned to make long-range and hierarchical predictions. To test this hypothesis, we analyze the fMRI brain signals of 304 subjects each listening to ≈70 min of short stories. After confirming that the activations of deep language algorithms linearly map onto those of the brain, we show that enhancing these models with long-range forecast representations improves their brainmapping. The results further reveal a hierarchy of predictions in the brain, whereby the fronto-parietal cortices forecast more abstract and more distant representations than the temporal cortices. Overall, this study strengthens predictive coding theory and suggests a critical role of long-range and hierarchical predictions in natural language processing.

show abstract

mentioning

confidence: 99%

Long-range and hierarchical language predictions in brains and algorithms

Caucheteux¹,

Gramfort²,

J³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Just as importantly, we want to understand which kinds of phenomena do not emerge from such an account, suggesting additional learning mechanisms or inductive biases may be responsible. Our approach is shared by a number of other recent works using state-of-the-art architectures from machine learning to provide insights into human cognition, such as using pre-trained models in categorization (Lake et al, 2015;Peterson et al, 2018) or in language (Arehalli and Linzen, 2020;Manning et al, 2020).…”

Section: Discussionmentioning

confidence: 99%

Cross-situational word learning with multimodal neural networks

Vong¹,

Lake²

2021

Preprint

View full text Add to dashboard Cite

In order to learn the mappings from words to referents, children must integrate co-occurrence information across individually ambiguous pairs of scenes and utterances, a challenge known as cross-situational word learning. In machine learning, recent multimodal neural networks have been shown to learn meaningful visual-linguistic mappings from cross-situational data, as needed to solve problems such as image captioning and visual question answering. These networks are potentially appealing as cognitive models because they can learn from raw visual and linguistic stimuli, something previous cognitive models have not addressed. In this paper, we examine whether recent machine learning approaches can help explain various behavioral phenomena from the psychological literature on cross-situational word learning. We consider two variants of a multimodal neural network architecture, and look at seven different phenomena associated with cross-situational word learning, and word learning more generally. Our results show that these networks can learn word-referent mappings from a single epoch of training, matching the amount of training found in cross-situational word learning experiments. Additionally, these networks capture some, but not all of the phenomena we studied, with all of the failures related to reasoning via mutual exclusivity. These results provide insight into the kinds of phenomena that arise naturally from relatively generic neural network learning algorithms, and which word learning phenomena require additional inductive biases.

show abstract

“…A prime example of the pure prediction approach can be found in Gulordava et al ( 2018 ): a vanilla LSTM is trained on a Language Modeling task, under the argument that the predictive mechanism is sufficient for the network to predict long-distance number agreement. The authors conclude that “LM-trained RNNs can construct abstract grammatical representations.” In a more ambivalent study, Arehalli and Linzen ( 2020 ) consider how real-time human comprehension and production do not always follow the general grammatical constraint of subject-verb agreement, due to a variety of possible syntactic or semantic factors. They replicate six experiments from the agreement attraction literature using LSTMs as subjects, and find that the model, despite its relatively simple structure, captures human behavior in at least three of them.…”

Section: Neural Language Models and Language Developmentmentioning

confidence: 99%

Can Recurrent Neural Networks Validate Usage-Based Theories of Grammar Acquisition?

Pannitto

Herbelot

2022

Front. Psychol.

View full text Add to dashboard Cite

It has been shown that Recurrent Artificial Neural Networks automatically acquire some grammatical knowledge in the course of performing linguistic prediction tasks. The extent to which such networks can actually learn grammar is still an object of investigation. However, being mostly data-driven, they provide a natural testbed for usage-based theories of language acquisition. This mini-review gives an overview of the state of the field, focusing on the influence of the theoretical framework in the interpretation of results.

show abstract

Neural Language Models Capture Some, But Not All, Agreement Attraction Effects

Cited by 13 publications

References 15 publications

Long-range and hierarchical language predictions in brains and algorithms

Long-range and hierarchical language predictions in brains and algorithms

Cross-situational word learning with multimodal neural networks

Can Recurrent Neural Networks Validate Usage-Based Theories of Grammar Acquisition?

Contact Info

Product

Resources

About