Proceedings of the 11th International Conference on Natural Language Generation 2018
DOI: 10.18653/v1/w18-6563
|View full text |Cite
|
Sign up to set email alerts
|

Decoding Strategies for Neural Referring Expression Generation

Abstract: RNN-based sequence generation is now widely used in NLP and NLG (natural language generation). Most work focusses on how to train RNNs, even though also decoding is not necessarily straightforward: previous work on neural MT found seq2seq models to radically prefer short candidates, and has proposed a number of beam search heuristics to deal with this. In this work, we assess decoding strategies for referring expression generation with neural models. Here, expression length is crucial: output should neither co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
13
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
1
1

Relationship

3
5

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 27 publications
1
13
0
Order By: Relevance
“…In sequence-to-sequence generation beyond MT, not a lot of work has been done on defining general stopping and normalization criteria. Zarrieß and Schlangen [70] present a study on decoding in referring expression generation (REG), a relatively constrained NLG sub-task where the length of the generated output is deemed central. They find that a variant of beam search that only keeps hypotheses if the same length, i.e., discards complete hypotheses that are not top candidates in the current time step, provides a better stopping criterion for REG than other criteria that have been explored in the MT literature.…”
Section: Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…In sequence-to-sequence generation beyond MT, not a lot of work has been done on defining general stopping and normalization criteria. Zarrieß and Schlangen [70] present a study on decoding in referring expression generation (REG), a relatively constrained NLG sub-task where the length of the generated output is deemed central. They find that a variant of beam search that only keeps hypotheses if the same length, i.e., discards complete hypotheses that are not top candidates in the current time step, provides a better stopping criterion for REG than other criteria that have been explored in the MT literature.…”
Section: Algorithmmentioning
confidence: 99%
“…Thus, when applying their model, Gu et al [174] combine the trainable decoder with the beam search heuristic. Zarrieß and Schlangen [70] test Chen et al [179]'s supervised approach in an REG experiment and combine it with greedy search to avoid the already discussed deficiencies of beam search, while Gu et al [174] and Zarrieß and Schlangen [70] rely on BLEU as a reward for the decoder, other metrics and rewards might constitute more interesting options to optimize decoding for, e.g., conversational goals. For instance, Panagiaris et al [118] present a transformer-based model for REG that incorporates RL and various decoding methods to balance the diversity and informativeness of referring expressions.…”
Section: Conversational Goalsmentioning
confidence: 99%
“…Note that in the current set-up, we do not include context or distractor images to generate discriminative descriptions, but focus on exploring the modality aspect in the generation task. See Zarrieß and Schlangen (2018) for a detailed discussion of the benefit of context features in image-based REG.…”
Section: The Rnn Caption Generatormentioning
confidence: 99%
“…Recent work in referring expression generation (REG) has focused more and more on large-scale image datasets (Kazemzadeh et al, 2014;Mao et al, 2016;Yu et al, 2016) and models that incorporate a state-of-the-art vision component (Mao et al, 2016;Yu et al, 2017;Zarrieß and Schlangen, 2018). As compared to traditional REG settings (Dale and Reiter, 1995;Krahmer and Van Deemter, 2012), these works have led to substantial advances in terms of the complexity of visual inputs that can be processed and the visual object categories that can be covered.…”
Section: Introductionmentioning
confidence: 99%
“…We investigate referring expression generation (REG henceforth), where the goal is to compute an utterance u that identifies a target referent r among other referents R in a visual scene. Research on REG has a long tradition in natural language generation (Krahmer and Van Deemter, 2012), and has recently been re-discovered in the area of Language & Vision (Mao et al, 2016;Yu et al, 2016;Zarrieß and Schlangen, 2018). These latter models for REG essentially implement variants of a standard neural image captioning architecture (Vinyals et al, 2015), combining a CNN and an LSTM to generate an utterance directly from objects marked via bounding boxes in real-world images.…”
Section: Introductionmentioning
confidence: 99%