Word Frequency Does Not Predict Grammatical Knowledge in Language Models

Yu, Charles; Sie, Ryan; Tedeschi, Nicolas; Bergen, Leon

doi:10.18653/v1/2020.emnlp-main.331

“…Noun frequency. We find similar evidence, like Yu et al (2020), that BERT did not perform better on nouns that were more frequent in the training set. Figure 8 shows these results-for each subject (we consider both singular and plural forms as a single subject), we plot that subject's error rate against its frequency in the training data.…”

Section: C2 Comparison With Prior Worksupporting

confidence: 71%

“…There has been substantial prior work on the ability of language models to perform abstract syntactic processing tasks (Hu et al, 2020) (see Linzen and Baroni (2020) for a review). On SVA specifically, Goldberg (2019) found that BERT achieves high accuracy on both natural sentences (97%) and nonce sentences (83%), and that error rate was independent of the number of "distractor" words between the subject and verb; Yu et al (2020) showed that language models do not exhibit better grammatical knowledge of more frequent nouns. Other work has found that BERT's performance is sensitive to factors that may suggest item-specific learning; Chaves and Richter (2021) found that BERT's performance on number agreement is sensitive to the verb, across seven different verbs, and Newman et al (2021) found that language models performed better on verbs that they predicted were likely in context.…”

Section: Syntactic Reasoning In Lmsmentioning

confidence: 99%

Frequency Effects on Syntactic Rule Learning in Transformers

Wei

¹

,

Garrette

²

,

Linzen

³

et al. 2021

Preprint

0

View full text Add to dashboard Cite

Pre-trained language models perform well on a variety of linguistic tasks that require symbolic reasoning, raising the question of whether such models implicitly represent abstract symbols and rules. We investigate this question using the case study of BERT's performance on English subject-verb agreement. Unlike prior work, we train multiple instances of BERT from scratch, allowing us to perform a series of controlled interventions at pre-training time. We show that BERT often generalizes well to subject-verb pairs that never occurred in training, suggesting a degree of rule-governed behavior. We also find, however, that performance is heavily influenced by word frequency, with experiments showing that both the absolute frequency of a verb form, as well as the frequency relative to the alternate inflection, are causally implicated in the predictions BERT makes at inference time. Closer analysis of these frequency effects reveals that BERT's behavior is consistent with a system that correctly applies the SVA rule in general but struggles to overcome strong training priors and to estimate agreement features (singular vs. plural) on infrequent lexical items.

show abstract

“…Similarly, we observe that models succeed on our MW metric indicating that they correctly inflect verbs with high in-context probability under the model. Relatedly, Yu et al (2020) investigate the nouns used in TSE minimal pairs and find that language model performance at subject-verb number agreement is uncorrelated with unigram probability of the noun. We instead focus on model-estimated in-context probability of the verb in minimal pairs, finding that model performance increases with the model probability.…”

Section: Lexical Choice In Syntactic Evaluationmentioning

confidence: 99%

Refining Targeted Syntactic Evaluation of Language Models

Newman¹,

Ang²,

Gong³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Targeted syntactic evaluation of subject-verb number agreement in English (TSE) evaluates language models' syntactic knowledge using hand-crafted minimal pairs of sentences that differ only in the main verb's conjugation. The method evaluates whether language models rate each grammatical sentence as more likely than its ungrammatical counterpart. We identify two distinct goals for TSE. First, evaluating the systematicity of a language model's syntactic knowledge: given a sentence, can it conjugate arbitrary verbs correctly? Second, evaluating a model's likely behavior: given a sentence, does the model concentrate its probability mass on correctly conjugated verbs, even if only on a subset of the possible verbs? We argue that current implementations of TSE do not directly capture either of these goals, and propose new metrics to capture each goal separately. Under our metrics, we find that TSE overestimates systematicity of language models, but that models score up to 40% better on verbs that they predict are likely in context.

show abstract

“…There has been substantial prior work on the ability of language models to perform abstract syntactic processing tasks (Hu et al, 2020) (see Linzen and Baroni (2020) for a review). On SVA specifically, Goldberg (2019) found that BERT achieves high accuracy on both natural sentences (97%) and nonce sentences (83%), and that error rate was independent of the number of "distractor" words between the subject and verb; Yu et al (2020) showed that language models do not exhibit better grammatical knowledge of more frequent nouns. Other work has found that BERT's performance is sensitive to factors that may suggest item-specific learning; Chaves and Richter (2021) found that BERT's performance on number agreement is sensitive to the verb, across seven different verbs, and Newman et al (2021) found that language models performed better on verbs that they predicted were likely in context.…”

Section: Syntactic Reasoning In Lmsmentioning

confidence: 99%

Frequency Effects on Syntactic Rule Learning in Transformers

Wei¹,

Garrette²,

Linzen³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Pre-trained language models perform well on a variety of linguistic tasks that require symbolic reasoning, raising the question of whether such models implicitly represent abstract symbols and rules. We investigate this question using the case study of BERT's performance on English subject-verb agreement. Unlike prior work, we train multiple instances of BERT from scratch, allowing us to perform a series of controlled interventions at pre-training time. We show that BERT often generalizes well to subject-verb pairs that never occurred in training, suggesting a degree of rule-governed behavior. We also find, however, that performance is heavily influenced by word frequency, with experiments showing that both the absolute frequency of a verb form, as well as the frequency relative to the alternate inflection, are causally implicated in the predictions BERT makes at inference time. Closer analysis of these frequency effects reveals that BERT's behavior is consistent with a system that correctly applies the SVA rule in general but struggles to overcome strong training priors and to estimate agreement features (singular vs. plural) on infrequent lexical items.

show abstract

Word Frequency Does Not Predict Grammatical Knowledge in Language Models

Cited by 8 publications

References 23 publications

Frequency Effects on Syntactic Rule Learning in Transformers

Frequency Effects on Syntactic Rule Learning in Transformers

Refining Targeted Syntactic Evaluation of Language Models

Frequency Effects on Syntactic Rule Learning in Transformers

Contact Info

Product

Resources

About