Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan

doi:10.18653/v1/p18-1027

Cited by 232 publications

(188 citation statements)

References 13 publications

Supporting

Mentioning

177

Contrasting

Unclassified

Order By: Relevance

“…Since this work aims at investigating and gaining an understanding of the kinds of information a generative neural response model learns to use, the most relevant pieces of work are where sim- ilar analyses have been carried out to understand the behavior of neural models in other settings. An investigation into how LSTM based unconditional language models use available context was carried out by Khandelwal et al (2018). They empirically demonstrate that models are sensitive to perturbations only in the nearby context and typically use only about 150 words of context.…”

Section: Related Workmentioning

confidence: 99%

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study

Sankar¹,

Subramanian²,

Pal³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Neural generative models have been become increasingly popular when building conversational agents. They offer flexibility, can be easily adapted to new domains, and require minimal domain engineering. A common criticism of these systems is that they seldom understand or use the available dialog history effectively. In this paper, we take an empirical approach to understanding how these models use the available dialog history by studying the sensitivity of the models to artificially introduced unnatural changes or perturbations to their context at test time. We experiment with 10 different types of perturbations on 4 multi-turn dialog datasets and find that commonly used neural dialog architectures like recurrent and transformer-based seq2seq models are rarely sensitive to most perturbations such as missing or reordering utterances, shuffling words, etc. Also, by open-sourcing our code, we believe that it will serve as a useful diagnostic tool for evaluating dialog systems in the future 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study

Sankar¹,

Subramanian²,

Pal³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Intuitively, words that are closer in a sentence should have stronger correlation. This has been verified in [6]. Thus, it is promising to design a new word embedding method that not only captures the context information but also models the dynamics in a word sequence.…”

Section: Introductionmentioning

confidence: 75%

Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding

Chen

Wang

Kuo

2019

2019 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

Language processing becomes more and more important in multimedia processing. Although embedded vector representations of words offer impressive performance on many natural language processing (NLP) applications, the information of ordered input sequences is lost to some extent if only context-based samples are used in the training. For further performance improvement, two new post-processing techniques, called post-processing via variance normalization (PVN) and post-processing via dynamic embedding (PDE), are proposed in this work. The PVN method normalizes the variance of principal components of word vectors, while the PDE method learns orthogonal latent variables from ordered input sequences. The PVN and the PDE methods can be integrated to achieve better performance. We apply these post-processing techniques to several popular word embedding methods to yield their post-processed representations. Extensive experiments are conducted to demonstrate the effectiveness of the proposed post-processing techniques.

show abstract

“…Although RNNs are widely adopted, there are still some drawbacks on RNNs: Back propagation procedure through time usually suffers from gradient vanishing and explosion [14]; The training process is also hard to be parallelized because of the consecutive operations. What's more, previous work has showed that LSTM is hard to tackle with longer sentences (say, more than 200 context words) [15], which hinders their further spreading.…”

Section: A From Vanilla Rnn To the Transformermentioning

confidence: 99%

An Augmented Transformer Architecture for Natural Language Generation Tasks

Adele

Liu

et al. 2019

2019 International Conference on Data Mining Workshops (ICDMW)

View full text Add to dashboard Cite

The Transformer based neural networks have been showing significant advantages on most evaluations of various natural language processing and other sequence-tosequence tasks due to its inherent architecture based superiorities. Although the main architecture of the Transformer has been continuously being explored, little attention was paid to the positional encoding module. In this paper, we enhance the sinusoidal positional encoding algorithm by maximizing the variances between encoded consecutive positions to obtain additional promotion. Furthermore, we propose an augmented Transformer architecture encoded with additional linguistic knowledge, such as the Part-of-Speech (POS) tagging, to boost the performance on some natural language generation tasks, e.g., the automatic translation and summarization tasks. Experiments show that the proposed architecture attains constantly superior results compared to the vanilla Transformer.

show abstract

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

Cited by 232 publications

References 13 publications

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study

Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding

An Augmented Transformer Architecture for Natural Language Generation Tasks

Contact Info

Product

Resources

About