Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018
DOI: 10.18653/v1/p18-1027
|View full text |Cite
|
Sign up to set email alerts
|

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

Abstract: We know very little about how neural language models (LM) use prior linguistic context. In this paper, we investigate the role of context in an LSTM LM, through ablation studies. Specifically, we analyze the increase in perplexity when prior context words are shuffled, replaced, or dropped. On two standard datasets, Penn Treebank and WikiText-2, we find that the model is capable of using about 200 tokens of context on average, but sharply distinguishes nearby context (recent 50 tokens) from the distant history… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

10
177
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 232 publications
(188 citation statements)
references
References 13 publications
10
177
0
1
Order By: Relevance
“…Since this work aims at investigating and gaining an understanding of the kinds of information a generative neural response model learns to use, the most relevant pieces of work are where sim- ilar analyses have been carried out to understand the behavior of neural models in other settings. An investigation into how LSTM based unconditional language models use available context was carried out by Khandelwal et al (2018). They empirically demonstrate that models are sensitive to perturbations only in the nearby context and typically use only about 150 words of context.…”
Section: Related Workmentioning
confidence: 99%
“…Since this work aims at investigating and gaining an understanding of the kinds of information a generative neural response model learns to use, the most relevant pieces of work are where sim- ilar analyses have been carried out to understand the behavior of neural models in other settings. An investigation into how LSTM based unconditional language models use available context was carried out by Khandelwal et al (2018). They empirically demonstrate that models are sensitive to perturbations only in the nearby context and typically use only about 150 words of context.…”
Section: Related Workmentioning
confidence: 99%
“…Intuitively, words that are closer in a sentence should have stronger correlation. This has been verified in [6]. Thus, it is promising to design a new word embedding method that not only captures the context information but also models the dynamics in a word sequence.…”
Section: Introductionmentioning
confidence: 75%
“…Although RNNs are widely adopted, there are still some drawbacks on RNNs: Back propagation procedure through time usually suffers from gradient vanishing and explosion [14]; The training process is also hard to be parallelized because of the consecutive operations. What's more, previous work has showed that LSTM is hard to tackle with longer sentences (say, more than 200 context words) [15], which hinders their further spreading.…”
Section: A From Vanilla Rnn To the Transformermentioning
confidence: 99%