Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.70
|View full text |Cite
|
Sign up to set email alerts
|

What Context Features Can Transformer Language Models Use?

Abstract: Transformer-based language models benefit from conditioning on contexts of hundreds to thousands of previous tokens. What aspects of these contexts contribute to accurate model prediction? We describe a series of experiments that measure usable information by selectively ablating lexical and structural information in transformer language models trained on English Wikipedia. In both mid-and longrange contexts, we find that several extremely destructive context manipulations-including shuffling word order within… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
14
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
2
2
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 33 publications
1
14
0
Order By: Relevance
“…Specifically, we examined versions made up of: (i) all the content words, that is, nouns, verbs, adjectives, and adverbs ( KeepContentW ); (ii) nouns, verbs, and adjectives ( KeepNVA ); (iii) nouns and verbs ( KeepNV ); (iv) nouns ( KeepN ); and (v) only the function words ( KeepFunctionW ). Following O’Connor and Andreas (2021) , we included pronouns and proper names in the set of nouns. Note also that because not all the sentences had adverbs and/or adjectives, some pairs of the conditions (i), (ii), (iii), and (iv) could be identical for some sentences.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Specifically, we examined versions made up of: (i) all the content words, that is, nouns, verbs, adjectives, and adverbs ( KeepContentW ); (ii) nouns, verbs, and adjectives ( KeepNVA ); (iii) nouns and verbs ( KeepNV ); (iv) nouns ( KeepN ); and (v) only the function words ( KeepFunctionW ). Following O’Connor and Andreas (2021) , we included pronouns and proper names in the set of nouns. Note also that because not all the sentences had adverbs and/or adjectives, some pairs of the conditions (i), (ii), (iii), and (iv) could be identical for some sentences.…”
Section: Methodsmentioning
confidence: 99%
“…The perturbation manipulation conditions that we use in the current work are motivated by prior theorizing in language research and/or past empirical findings from both neuroscience and natural language processing (NLP). The perturbation manipulations include (i) word-order manipulations of varying severity that preserve or destroy local dependency structure (following Mollica et al, 2020 ), allowing us to investigate the effect of word order degradation while controlling for local word co-occurrence statistics; (ii) information-loss manipulations with deletion of words of different parts of speech (following O’Connor & Andreas, 2021 ), allowing us to investigate loss of information from particular classes of words; (iii) semantic-distance manipulations with sentence substitutions that relate to the meaning of the original sentence to varying degrees (inspired by Pereira et al, 2018 ), allowing us to investigate loss of semantic and more general topical information while retaining sentence well-formedness. As a baseline length-matched control condition, we include a random word list, where each word is substituted with a different random word.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…This distinction has been explicitly noted before, by e.g Bartlett (1953)(Pimentel et al, 2019(Pimentel et al, , 2020bBugliarello et al, 2020;McAllester and Stratos, 2020;Torroba Hennigen et al, 2020;Fernandes et al, 2021;O'Connor and Andreas, 2021). In those works, though, it was usually interpreted as a computational approximation to the truth-MI (or to V-information(Xu et al, 2020), which is discussed later in the paper).…”
mentioning
confidence: 85%