Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.230
|View full text |Cite
|
Sign up to set email alerts
|

Cross-media Structured Common Space for Multimedia Event Extraction

Abstract: We introduce a new task, MultiMedia Event Extraction (M 2 E 2 ), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. 1 We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modaliti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
44
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 67 publications
(44 citation statements)
references
References 40 publications
0
44
0
Order By: Relevance
“…Cross-media Research. We are also related to cross-media research, where texts and images are jointly exploited for a variety of applications, such as personalized image captioning (Park et al, 2019), event extraction (Li et al, 2020), sarcasm detection (Cai et al, 2019), and text-image relation classification (Vempala and Preotiuc-Pietro, 2019). Some of them have pointed out the usefulness of OCR texts (Chen et al, 2016) and image attributes (Wu et al, 2016) to endow images with higher-level semantics beyond visual features, where we are the first to study how OCR texts and image attributes work together to indicate keyphrases.…”
Section: Related Workmentioning
confidence: 99%
“…Cross-media Research. We are also related to cross-media research, where texts and images are jointly exploited for a variety of applications, such as personalized image captioning (Park et al, 2019), event extraction (Li et al, 2020), sarcasm detection (Cai et al, 2019), and text-image relation classification (Vempala and Preotiuc-Pietro, 2019). Some of them have pointed out the usefulness of OCR texts (Chen et al, 2016) and image attributes (Wu et al, 2016) to endow images with higher-level semantics beyond visual features, where we are the first to study how OCR texts and image attributes work together to indicate keyphrases.…”
Section: Related Workmentioning
confidence: 99%
“…Constructing a KG from each Multimedia News Article: We leverage a publicly available multimedia Information Extraction (IE) system (Li et al, 2020;Lin et al, 2020) to construct a withindocument knowledge graph KG = (N t , E r|a ) for each multimedia article. The IE system can extract 197 types of entities, 61 types of relations, and 144 types of events from text and images.…”
Section: Local Kg Representationmentioning
confidence: 99%
“…SRL in Vision: has been explored in the context of human object interaction (Gupta and Malik, 2015), situation recognition (Yatskar et al, 2016), and multi-media extraction (Li et al, 2020). Most related to ours is the usage of SRLs for grounding (Silberer and Pinkal, 2018) in images and videos (Sadhu et al, 2020).…”
Section: Related Workmentioning
confidence: 99%