Recurrent Neural Network-Based Sentence Encoder with Gated Attention
            for Natural Language Inference

Chen, Qian; Zhu, Xiaodan; Ling, Zhen-Hua; Wei, Si; Jiang, Hui; Inkpen, Diana

doi:10.18653/v1/w17-5307

Cited by 76 publications

(37 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sentence-encoding based models use the Siamese architecture (Bromley et al, 1993;Chen et al, 2017b) shown in Figure 2 (a). Parameter-tied neural networks are applied to encode both the context and the response.…”

Section: Sentence-encoding Based Methodsmentioning

confidence: 99%

Sequential neural networks for noetic end-to-end response selection

Chen

Wang

2020

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

The noetic end-to-end response selection challenge as one track in the 7th Dialog System Technology Challenges (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context. This paper presents our systems that are ranked top 1 on both datasets under this challenge, one focused and small (Advising) and the other more diverse and large (Ubuntu). Previous state-of-the-art models use hierarchybased (utterance-level and token-level) neural networks to explicitly model the interactions among different turns' utterances for context modeling. In this paper, we investigate a sequential matching model based only on chain sequence for multi-turn response selection. Our results demonstrate that the potentials of sequential matching approaches have not yet been fully exploited in the past for multi-turn response selection. In addition to ranking top 1 in the challenge, the proposed model outperforms all previous models, including state-of-the-art hierarchy-based models, on two large-scale public multi-turn response selection benchmark datasets. Keywords:DSTC7, response selection, ESIM, BERT, end-to-end, sequential matching approaches 1. We develop an Enhanced Sequential Inference Model (ESIM) based system for the DSTC7 noetic end-to-end response selection track. On top of the ESIM model, we explore methods for exploiting multiple word embeddings, heuristic data augmentation, tuning the ratio between positive and negative samples, and emphasizing the importance of the most recent context utterances. 2. We propose a two-step approach for selecting the next utterance from a large amount of candidates (i.e., for subtask 2 on the Ubuntu dataset, we need to select the next utterance from a candidate pool of 120,000 sentences), by first using a sentence-encoding based method to select the top N candidates from the large set of candidates and then reranking them using ESIM, achieving a high performance with an acceptable overall computational cost. 3. We conduct systematic ablation analysis of the above-mentioned methods for enhancing the ESIM model performance. In particular, we develop effective and efficient model ensemble by averaging the output from models

show abstract

Section: Sentence-encoding Based Methodsmentioning

confidence: 99%

Sequential neural networks for noetic end-to-end response selection

Chen

Wang

2020

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

show abstract

“…We evaluate the proposed AR-Tree on three tasks: natural language inference, sentence sentiment analysis, and author profiling. (Bowman et al 2016) 3.7m 83.2 300D NSE (Munkhdalai and Yu 2017a) 6.3m 84.8 300D NTI-SLSTM-LSTM (Munkhdalai and Yu 2017b) 4.0m 83.4 300D Gumbel Tree-LSTM (Choi, Yoo, and goo Lee 2017) 2.9m 85.0 300D Self-Attentive (Lin et al 2017) 4.1m 84.4 300D Tf-idf Tree-LSTM (Ours) 3.5m 84.5 300D AR-Tree (Ours) 3.6m 85.5 600D Gated-Attention BiLSTM (Chen et al 2017) 11.6m 85.5 300D Decomposable attention (Parikh et al 2016) 582k 86.8 300D NTI-SLSTM-LSTM global attention (Munkhdalai and Yu 2017b) 3.2m 87.3 300D Structured Attention (Kim et al 2017) 2.4m 86.8 We set α = 0.1, λ = 1e − 5 in Eq. 4 through all experiments.…”

Section: Methodsmentioning

confidence: 99%

Learning to Embed Sentences Using Attentive Recursive Trees

Shi

Hou

et al. 2019

AAAI

View full text Add to dashboard Cite

Sentence embedding is an effective feature representation for most deep learning-based NLP tasks. One prevailing line of methods is using recursive latent tree-structured networks to embed sentences with task-specific structures. However, existing models have no explicit mechanism to emphasize taskinformative words in the tree structure. To this end, we propose an Attentive Recursive Tree model (AR-Tree), where the words are dynamically located according to their importance in the task. Specifically, we construct the latent tree for a sentence in a proposed important-first strategy, and place more attentive words nearer to the root; thus, AR-Tree can inherently emphasize important words during the bottomup composition of the sentence embedding. We propose an end-to-end reinforced training strategy for AR-Tree, which is demonstrated to consistently outperform, or be at least comparable to, the state-of-the-art sentence embedding methods on three sentence understanding tasks.

show abstract

“…In this study, we apply the one used in the attention module proposed in [16]. Dimension-wise features-based matching [43] is commonly used in many models such as [47], [48] for the task of natural language inference. The output of the matching step is scalar; therefore, we use a scoring layer which yields the input scalar value as it is.…”

Section: Reduce-match Modelsmentioning

confidence: 99%

Exploiting Text Matching Techniques for Knowledge-Grounded Conversation

2020

View full text Add to dashboard Cite

Knowledge-grounded conversation models aim at generating informative responses for the given dialogue context, based on external knowledge. To generate an informative and context-coherent response, it is important to conjugate dialogue context and external knowledge in a balanced manner. However, existing studies have paid less attention to finding appropriate knowledge sentences from external knowledge sources than to generating proper sentences with correct dialogue acts. In this paper, we propose two knowledge selection strategies: 1) Reduce-Match and 2) Match-Reduce and explore several neural knowledge-grounded conversation models based on each strategy. Models based on Reduce-Match strategy first distill the whole dialogue context into a single vector with salient features preserved and then compare this context vector with the representation of knowledge sentences to predict a relevant knowledge sentence. Models based on Match-Reduce strategy first match every turn of the context with knowledge sentences to capture fine-grained interactions and aggregate them while minimizing information loss to predict the knowledge sentence. Experimental results show that conversation models using each of our knowledge selection strategies outperform the competitive baselines not only in terms of knowledge selection accuracy but also in response generation performance. Our best model based on Match-Reduce outperforms the baselines in the comparative studies with the Wizard of Wikipedia dataset. Also, our best model based on Reduce-Match outperforms them with the CMU Document Grounded Conversations dataset.

show abstract

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference

Cited by 76 publications

References 20 publications

Sequential neural networks for noetic end-to-end response selection

Sequential neural networks for noetic end-to-end response selection

Learning to Embed Sentences Using Attentive Recursive Trees

Exploiting Text Matching Techniques for Knowledge-Grounded Conversation

Contact Info

Product

Resources

About