Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2016
DOI: 10.18653/v1/n16-1014
|View full text |Cite
|
Sign up to set email alerts
|

A Diversity-Promoting Objective Function for Neural Conversation Models

Abstract: Sequence-to-sequence neural network models for generation of conversational responses tend to generate safe, commonplace responses (e.g., I don't know) regardless of the input. We suggest that the traditional objective function, i.e., the likelihood of output (response) given input (message) is unsuited to response generation tasks. Instead we propose using Maximum Mutual Information (MMI) as the objective function in neural models. Experimental results demonstrate that the proposed MMI models produce more div… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

15
1,563
1
3

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 1,695 publications
(1,697 citation statements)
references
References 27 publications
15
1,563
1
3
Order By: Relevance
“…• MMI-anti: a Seq2Seq model with a Maximum Mutual Information (MMI) criterion (implemented as an anti-language model) (Li et al, 2016a) in the decoding process, which reduces the probability of generating "safe responses".…”
Section: Baselinesmentioning
confidence: 99%
See 2 more Smart Citations
“…• MMI-anti: a Seq2Seq model with a Maximum Mutual Information (MMI) criterion (implemented as an anti-language model) (Li et al, 2016a) in the decoding process, which reduces the probability of generating "safe responses".…”
Section: Baselinesmentioning
confidence: 99%
“…Diversity Metrics: To measure the informativeness and diversity of the generated responses, we follow the dist-1 and dist-2 metrics proposed by Li et al (2016a) and , and introduce a Novelty metric. The dist-1 (dist-2) is defined as the number of unique unigrams (bigrams for dist-2).…”
Section: Evaluation Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…The TSM can be described as a conditional probability [32]. It predicts the probability of C being conditioned by A and B, as described in (7).…”
Section: Triple-seq2seq Modelmentioning
confidence: 99%
“…We use 10-fold cross-validation, and only two types of features: n-grams and Word2Vec word embeddings. We expect Word2Vec to be able to capture semantic generalizations that n-grams do not (Socher et al, 2013;Li et al, 2016). The n-gram features include unigrams, bigrams, and trigrams, including sequences of punctuation (for example, ellipses or "!!!…”
Section: Learning Experimentsmentioning
confidence: 99%