2014
DOI: 10.1162/tacl_a_00166
|View full text |Cite
|
Sign up to set email alerts
|

From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

Abstract: We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituents and their denotations, based on a large corpus of 30K images and 150K descriptive captions.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
1,103
0
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 1,933 publications
(1,179 citation statements)
references
References 13 publications
3
1,103
0
1
Order By: Relevance
“…The inclusion of image data is one way to bring together the classical and the probabilistic approaches to NLI, whereby the image can be viewed as a (partial) representation of the 'world' described by the premise, with the entailment relationship being determined jointly from both. This is in line with the suggestion by Young et al (2014), that images be considered as akin to the 'possible worlds' in which sentences (in this case, captions) receive their denotation.…”
Section: Introductionsupporting
confidence: 89%
See 2 more Smart Citations
“…The inclusion of image data is one way to bring together the classical and the probabilistic approaches to NLI, whereby the image can be viewed as a (partial) representation of the 'world' described by the premise, with the entailment relationship being determined jointly from both. This is in line with the suggestion by Young et al (2014), that images be considered as akin to the 'possible worlds' in which sentences (in this case, captions) receive their denotation.…”
Section: Introductionsupporting
confidence: 89%
“…We focus on the subset of entailment pairs in the SNLI dataset (Bowman et al, 2015). The majority of instances in SNLI consist of premises that were originally elicited as descriptive captions for images in Flickr30k (Young et al, 2014;ands Liwei Wang et al, 2015). 1 In constructing the SNLI dataset, Amazon Mechanical Turk workers were shown the captions/premises without the corresponding images, and were asked to write a new caption that was (i) true, given the premise (entailment); (ii) false, given the premise (contradiction); and (iii) possibly true (neutral).…”
Section: Datamentioning
confidence: 99%
See 1 more Smart Citation
“…A number of gold-standard datasets are available for RTE (Marelli et al, 2014; Young et al, 2014; Levy et al, 2014). We consider the Stanford Natural Language Inference (SNLI) dataset (Bowman et al, 2015).…”
Section: Background and Related Workmentioning
confidence: 99%
“…A total of 31,783 images were used which had an RGB scale. Datasets can be found at [41]. Furthermore, each image of the dataset is transformed into blocks that are scrambled thereafter.…”
Section: Dataset Preparation and Encryptionmentioning
confidence: 99%