Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.141
|View full text |Cite
|
Sign up to set email alerts
|

CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French

Abstract: Figure 1: Overview of in-the-wild monologue videos and sentence utterances in the CMU-MOSEAS dataset. Each sentence is annotated for 20 labels including sentiment, subjectivity, emotions and attributes. "L" denotes Likert (intensity) and "B" denotes Binary for the type of the labels. The example above is a Portuguese video.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 54 publications
0
11
0
Order By: Relevance
“…For future work, we will continue exploring this topic and expanding the framework to include more families of languages. As more benchmarks [48,40,3] on multilingual video-text pairs become available, we are interested in enhancing the grounding between vision and language by leveraging the temporal information from videos. guages to learn stronger vision-to-monolingual-sentence alignment.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…For future work, we will continue exploring this topic and expanding the framework to include more families of languages. As more benchmarks [48,40,3] on multilingual video-text pairs become available, we are interested in enhancing the grounding between vision and language by leveraging the temporal information from videos. guages to learn stronger vision-to-monolingual-sentence alignment.…”
Section: Discussionmentioning
confidence: 99%
“…We formulate VQA as a multi-label classification problem, where the model predicts answer from the candidate pool. 3 VQA score [20] is used to compare model predictions against 10 human-annotated answers in VQA v2.0. On Visual Genome VQA Japanese, which only has one ground-truth answer to each question, we use accuracy and BLEU score as the evaluation metrics.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the past five years, text-based aspect-level sentiment analysis has drawn much attention Chen and Qian, 2019;Zhang and Qian, 2020;Zheng et al, 2020;Tulkens and van Cranenburgh, 2020;Akhtar et al, 2020). While, multimodal target-oriented sentiment analysis has become more and more vital because of its urgent need to be applied to the industry recently (Akhtar et al, 2019;Zadeh et al, 2020;Sun et al, 2021a;Tang et al, 2019;Zhang et al, 2020bZhang et al, , 2021a. In the following, we mainly overview the limited studies of multi-modal aspect terms extraction and multi-modal aspect sentiment classification on text and image modalities.…”
Section: Related Workmentioning
confidence: 99%
“…Within the same category of multimodal fusion, we plan to add datasets within the same application domains as well as to expand to new application domains. Within the current domains, we plan to include (1) the hateful memes challenge [82] as a core challenge in multimedia to ensure safer learning from ubiquitous text and images from the internet, (2) more datasets in the robotics and HCI domains where there are many opportunities for multimodal modeling, and (3) several datasets which are of broad interest but are released via licenses that restrict redistribution such as dyadic emotion recognition on IEMOCAP [21], deception prediction on from real-world Trial Data [123], and multilingual affect recognition on CMU-MOSEAS [186] which was only just recently released. We are currently working with the authors to integrate some of these datasets into MULTIBENCH in the near future.…”
Section: I11 Fusionmentioning
confidence: 99%