Decoding Strategies for Neural Referring Expression Generation

Zarrieß, Sina; Schlangen, David

doi:10.18653/v1/w18-6563

Cited by 13 publications

(14 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In sequence-to-sequence generation beyond MT, not a lot of work has been done on defining general stopping and normalization criteria. Zarrieß and Schlangen [70] present a study on decoding in referring expression generation (REG), a relatively constrained NLG sub-task where the length of the generated output is deemed central. They find that a variant of beam search that only keeps hypotheses if the same length, i.e., discards complete hypotheses that are not top candidates in the current time step, provides a better stopping criterion for REG than other criteria that have been explored in the MT literature.…”

Section: Algorithmmentioning

confidence: 99%

“…Thus, when applying their model, Gu et al [174] combine the trainable decoder with the beam search heuristic. Zarrieß and Schlangen [70] test Chen et al [179]'s supervised approach in an REG experiment and combine it with greedy search to avoid the already discussed deficiencies of beam search, while Gu et al [174] and Zarrieß and Schlangen [70] rely on BLEU as a reward for the decoder, other metrics and rewards might constitute more interesting options to optimize decoding for, e.g., conversational goals. For instance, Panagiaris et al [118] present a transformer-based model for REG that incorporates RL and various decoding methods to balance the diversity and informativeness of referring expressions.…”

Section: Conversational Goalsmentioning

confidence: 99%

See 1 more Smart Citation

Decoding Methods in Neural Language Generation: A Survey

2021

Self Cite

View full text Add to dashboard Cite

Neural encoder-decoder models for language generation can be trained to predict words directly from linguistic or non-linguistic inputs. When generating with these so-called end-to-end models, however, the NLG system needs an additional decoding procedure that determines the output sequence, given the infinite search space over potential sequences that could be generated with the given vocabulary. This survey paper provides an overview of the different ways of implementing decoding on top of neural network-based generation models. Research into decoding has become a real trend in the area of neural language generation, and numerous recent papers have shown that the choice of decoding method has a considerable impact on the quality and various linguistic properties of the generation output of a neural NLG system. This survey aims to contribute to a more systematic understanding of decoding methods across different areas of neural NLG. We group the reviewed methods with respect to the broad type of objective that they optimize in the generation of the sequence—likelihood, diversity, and task-specific linguistic constraints or goals—and discuss their respective strengths and weaknesses.

show abstract

Section: Algorithmmentioning

confidence: 99%

Section: Conversational Goalsmentioning

confidence: 99%

Decoding Methods in Neural Language Generation: A Survey

2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Note that in the current set-up, we do not include context or distractor images to generate discriminative descriptions, but focus on exploring the modality aspect in the generation task. See Zarrieß and Schlangen (2018) for a detailed discussion of the benefit of context features in image-based REG.…”

Section: The Rnn Caption Generatormentioning

confidence: 99%

“…Recent work in referring expression generation (REG) has focused more and more on large-scale image datasets (Kazemzadeh et al, 2014;Mao et al, 2016;Yu et al, 2016) and models that incorporate a state-of-the-art vision component (Mao et al, 2016;Yu et al, 2017;Zarrieß and Schlangen, 2018). As compared to traditional REG settings (Dale and Reiter, 1995;Krahmer and Van Deemter, 2012), these works have led to substantial advances in terms of the complexity of visual inputs that can be processed and the visual object categories that can be covered.…”

Section: Introductionmentioning

confidence: 99%

Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings

Han

Zarrieß

2019

Proceedings of the 12th International Conference on Natural Language Generation

Self Cite

View full text Add to dashboard Cite

A lot of recent work in Language & Vision has looked at generating descriptions or referring expressions for objects in scenes of real-world images, though focusing mostly on relatively simple language like object names, color and location attributes (e.g., brown chair on the left). This paper presents work on Draw-and-Tell, a dataset of detailed descriptions for common objects in images where annotators have produced fine-grained attributecentric expressions distinguishing a target object from a range of similar objects. Additionally, the dataset comes with hand-drawn sketches for each object. As Draw-and-Tell is medium-sized and contains a rich vocabulary, it constitutes an interesting challenge for CNN-LSTM architectures used in state-ofthe-art image captioning models. We explore whether the additional modality given through sketches can help such a model to learn to accurately ground detailed language referring expressions to object shapes. Our results are encouraging.

show abstract

“…We investigate referring expression generation (REG henceforth), where the goal is to compute an utterance u that identifies a target referent r among other referents R in a visual scene. Research on REG has a long tradition in natural language generation (Krahmer and Van Deemter, 2012), and has recently been re-discovered in the area of Language & Vision (Mao et al, 2016;Yu et al, 2016;Zarrieß and Schlangen, 2018). These latter models for REG essentially implement variants of a standard neural image captioning architecture (Vinyals et al, 2015), combining a CNN and an LSTM to generate an utterance directly from objects marked via bounding boxes in real-world images.…”

Section: Introductionmentioning

confidence: 99%

Know What You Don’t Know: Modeling a Pragmatic Speaker that Refers to Objects of Unknown Categories

Zarrieß¹,

Schlangen²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Zero-shot learning in Language & Vision is the task of correctly labelling (or naming) objects of novel categories. Another strand of work in L&V aims at pragmatically informative rather than "correct" object descriptions, e.g. in reference games. We combine these lines of research and model zero-shot reference games, where a speaker needs to successfully refer to a novel object in an image. Inspired by models of "rational speech acts", we extend a neural generator to become a pragmatic speaker reasoning about uncertain object categories. As a result of this reasoning, the generator produces fewer nouns and names of distractor categories as compared to a literal speaker. We show that this conversational strategy for dealing with novel objects often improves communicative success, in terms of resolution accuracy of an automatic listener.

show abstract

Decoding Strategies for Neural Referring Expression Generation

Cited by 13 publications

References 27 publications

Decoding Methods in Neural Language Generation: A Survey

Decoding Methods in Neural Language Generation: A Survey

Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings

Know What You Don’t Know: Modeling a Pragmatic Speaker that Refers to Objects of Unknown Categories

Contact Info

Product

Resources

About