A Systematic Assessment of Syntactic Generalization in Neural Language Models

Hu, Jennifer; Gauthier, Jon; Qian, Peng; Wilcox, Ethan; Lévy, Roger

doi:10.18653/v1/2020.acl-main.158

Cited by 104 publications

(135 citation statements)

References 44 publications

Supporting

Mentioning

113

Contrasting

Unclassified

Order By: Relevance

“…Much has been written about the ability of ANNs to learn number agreement (Linzen et al, 2016;Gulordava et al, 2018;Giulianelli et al, 2018), including their ability to maintain the dependency across different types of intervening material (Marvin and Linzen, 2018) and with coordinated noun phrases . Hu et al (2020) find that model architecture, rather than training data size, may contribute most to performance on number agreement and related tasks. Focusing on RNN models, Lakretz et al (2019) find evidence that number agreement is tracked by specific "number" units that work in concert with units that carry more general syntactic information like tree depth.…”

Section: Related Workmentioning

confidence: 88%

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Wilcox

Qian

Futrell

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by developing controlled experiments that probe models' syntactic nominal number and verbal argument structure generalizations for tokens seen as few as two times during training. Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to a transformed context (e.g., an interrogative sentence). We test four models trained on the same dataset: an n-gram baseline, an LSTM, and two LSTM-variants trained with explicit structural supervision (Dyer et al., 2016;Charniak et al., 2016). We find that in most cases, the neural models are able to induce the proper syntactic generalizations after minimal exposure, often from just two examples during training, and that the two structurally supervised models generalize more accurately than the LSTM model. All neural models are able to leverage information learned in base contexts to drive expectations in transformed contexts, indicating that they have learned some invariance properties of syntax.

show abstract

Section: Related Workmentioning

confidence: 88%

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Wilcox

Qian

Futrell

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…find that while LMs learn agreement phenomena at a similarly early stage, other phenomena require more data to learn. Finally, Hu et al (2020) find that adopting architectures that build in linguistic bias, such as RNNGs (Dyer et al, 2016), has a big-ger effect on the acceptability task than increasing training data from 1M to 40M words.…”

Section: Related Workmentioning

confidence: 96%

“…For instance, an RNN classifier is capable of representing any function, but prefers ones that focus mostly on local relationships within the input sequence (Dhingra et al, 2018;Ravfogel et al, 2019). Some recent work seeks to design neural architectures that build in desirable inductive biases (Dyer et al, 2016;Battaglia et al, 2018), or compares the immutable biases of different architectures Hu et al, 2020). However, inductive biases can also be learned by biological (Harlow, 1949) and artificial systems alike (Lake et al, 2017).…”

Section: Inductive Biasmentioning

confidence: 99%

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

Warstadt¹,

Zhang²,

Li³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding. However, we want pretrained models to learn not only to represent linguistic features, but also to use those features preferentially during fine-turning. With this goal in mind, we introduce a new English-language diagnostic set called MSGS (the Mixed Signals Generalization Set), which consists of 20 ambiguous binary classification tasks that we use to test whether a pretrained model prefers linguistic or surface generalizations during finetuning. We pretrain RoBERTa models from scratch on quantities of data ranging from 1M to 1B words and compare their performance on MSGS to the publicly available RoBERTa BASE . We find that models can learn to represent linguistic features with little pretraining data, but require far more data to learn to prefer linguistic generalizations over surface ones. Eventually, with about 30B words of pretraining data, RoBERTa BASE does demonstrate a linguistic bias with some regularity. We conclude that while self-supervised pretraining is an effective way to learn helpful inductive biases, there is likely room to improve the rate at which models learn which features matter.

show abstract

“…Recent work has suggested that LMs acquire abstract, often human-like, knowledge of syntax (e.g., Gulordava et al, 2018;Hu et al, 2020). Additionally, knowledge of grammatical and referential aspects linking a pronoun to its antecedent noun (reference) have been demonstrated for both transformer and long short-term memory architectures (Sorodoc et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

Discourse structure interacts with reference but not syntax in neural language models

Davis

Schijndel

2020

Proceedings of the 24th Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

Language models (LMs) trained on large quantities of text have been claimed to acquire abstract linguistic representations. Our work tests the robustness of these abstractions by focusing on the ability of LMs to learn interactions between different linguistic representations. In particular, we utilized stimuli from psycholinguistic studies showing that humans can condition reference (i.e. coreference resolution) and syntactic processing on the same discourse structure (implicit causality). We compared both transformer and long short-term memory LMs to find that, contrary to humans, implicit causality only influences LM behavior for reference, not syntax, despite model representations that encode the necessary discourse information. Our results further suggest that LM behavior can contradict not only learned representations of discourse but also syntactic agreement, pointing to shortcomings of standard language modeling.

show abstract

A Systematic Assessment of Syntactic Generalization in Neural Language Models

Cited by 104 publications

References 44 publications

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

Discourse structure interacts with reference but not syntax in neural language models

Contact Info

Product

Resources

About