From image descriptions to visual denotations: New similarity metrics                     for semantic inference over event descriptions

Young, Paul Thomas; Lai, Alice; Hodosh, Micah; Hockenmaier, Julia

doi:10.1162/tacl_a_00166

Cited by 1,933 publications

(1,179 citation statements)

References 13 publications

Supporting

Mentioning

1,103

Contrasting

Unclassified

Order By: Relevance

“…The inclusion of image data is one way to bring together the classical and the probabilistic approaches to NLI, whereby the image can be viewed as a (partial) representation of the 'world' described by the premise, with the entailment relationship being determined jointly from both. This is in line with the suggestion by Young et al (2014), that images be considered as akin to the 'possible worlds' in which sentences (in this case, captions) receive their denotation.…”

Section: Introductionsupporting

confidence: 89%

“…We focus on the subset of entailment pairs in the SNLI dataset (Bowman et al, 2015). The majority of instances in SNLI consist of premises that were originally elicited as descriptive captions for images in Flickr30k (Young et al, 2014;ands Liwei Wang et al, 2015). 1 In constructing the SNLI dataset, Amazon Mechanical Turk workers were shown the captions/premises without the corresponding images, and were asked to write a new caption that was (i) true, given the premise (entailment); (ii) false, given the premise (contradiction); and (iii) possibly true (neutral).…”

Section: Datamentioning

confidence: 99%

“…Some recent work has also investigated multimodal NLI, whereby the classification of the entailment relationship is done on the basis of image features (Xie et al, 2019;Lai, 2018), or a combination of image and textual features (Vu et al, 2018). In particular, Vu et al (2018) exploited the fact that the main portion of SNLI was created by reusing image captions from the Flickr30k dataset (Young et al, 2014) as premises, for which entailments, contradictions and neutral hypotheses were subsequently crowdsourced via Amazon Mechanical Turk (Bowman et al, 2015). This makes it possible to pair premises with the images for which they were originally written as descriptive captions, thereby reformulating the NLI problem as a Vision-Language task.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Visually grounded generation of entailments from premises

Jafaritazehjani¹,

Gatt

Tanti

2019

Proceedings of the 12th International Conference on Natural Language Generation

View full text Add to dashboard Cite

Natural Language Inference (NLI) is the task of determining the semantic relationship between a premise and a hypothesis. In this paper, we focus on the generation of hypotheses from premises in a multimodal setting, to generate a sentence (hypothesis) given an image and/or its description (premise) as the input. The main goals of this paper are (a) to investigate whether it is reasonable to frame NLI as a generation task; and (b) to consider the degree to which grounding textual premises in visual information is beneficial to generation. We compare different neural architectures, showing through automatic and human evaluation that entailments can indeed be generated successfully. We also show that multimodal models outperform unimodal models in this task, albeit marginally.

show abstract

Section: Introductionsupporting

confidence: 89%

Section: Datamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Visually grounded generation of entailments from premises

Jafaritazehjani¹,

Gatt

Tanti

2019

Proceedings of the 12th International Conference on Natural Language Generation

View full text Add to dashboard Cite

show abstract

“…A number of gold-standard datasets are available for RTE (Marelli et al, 2014; Young et al, 2014; Levy et al, 2014). We consider the Stanford Natural Language Inference (SNLI) dataset (Bowman et al, 2015).…”

Section: Background and Related Workmentioning

confidence: 99%

Building an Evaluation Scale using Item Response Theory

Lalor¹,

Wu²,

Yu³

2016

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual itemstheir difficulty and discriminating power -and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.

show abstract

“…A total of 31,783 images were used which had an RGB scale. Datasets can be found at [41]. Furthermore, each image of the dataset is transformed into blocks that are scrambled thereafter.…”

Section: Dataset Preparation and Encryptionmentioning

confidence: 99%

Hiding Data in Images Using Cryptography and Deep Neural Network

Sharma¹,

Aggarwal²,

Singhania³

et al. 2019

AIS

126

View full text Add to dashboard Cite

Steganography is an art of obscuring data inside another quotidian file of similar or varying types. Hiding data has always been of significant importance to digital forensics. Previously, steganography has been combined with cryptography and neural networks separately. Whereas, this research combines steganography, cryptography with the neural networks all together to hide an image inside another container image of the larger or same size. Although the cryptographic technique used is quite simple, but is effective when convoluted with deep neural nets. Other steganography techniques involve hiding data efficiently, but in a uniform pattern which makes it less secure. This method targets both the challenges and make data hiding secure and non-uniform.

show abstract

From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

Cited by 1,933 publications

References 13 publications

Visually grounded generation of entailments from premises

Visually grounded generation of entailments from premises

Building an Evaluation Scale using Item Response Theory

Hiding Data in Images Using Cryptography and Deep Neural Network

Contact Info

Product

Resources

About