Multimodal Indicators of Humor in Videos

Yang, Zixiaofan; Lin, Aiming; Hirschberg, Julia

doi:10.1109/mipr.2019.00109

Cited by 7 publications

(5 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Kayatani et al [16] also uses the same TV drama series as their testbed and presents a model to predict whether an utterance of a character causes laughter based on subtitles as well as facial features and the identity of the character. Yang et al [18] obtains humor labels in videos based on user comments together with visual and audio features. The ground-truth humor labels in these methods are mainly associated with texts and a prediction is made for a sentence.…”

Section: Related Workmentioning

confidence: 99%

“…In recent years, some methods have been proposed to predict humor using both single modality and multiple modalities, which are often accompanied by a dedicated dataset [7][8][9][10][11][12]. Single modal humor prediction mainly uses the linguistic modality [13][14][15], while multiple modal humor prediction combines the information from different modalities [6,[16][17][18]. The ground-truth labels of these methods are usually associated with blocks of text, like sentences and dialogues, while signals from other modalities are often treated as supplementary.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-modal humor segment prediction in video

2023

View full text Add to dashboard Cite

Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted by humans. Finding humor in videos is an interesting but challenging task for an intelligent system. Previous methods predict humor in the sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as videos and speech. Such methods ignore humor caused by the visual modality in their design, since their prediction is made for a sentence. In this work, we first give new annotations to humor based on a sitcom by setting up temporal segments of ground truth humor derived from the laughter track. Then, we propose a method to find these temporal segments of humor. We adopt an approach based on sliding window, where the visual modality is described by pose and facial features along with the linguistic modality given as subtitles in each sliding window. We use long short-term memory networks to encode the temporal dependency in poses and facial features and pre-trained BERT to handle subtitles. Experimental results show that our method improves the performance of humor prediction.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Multi-modal humor segment prediction in video

2023

View full text Add to dashboard Cite

show abstract

“…MaSaC [40] consists of Hindi-English code-mixed sitcom dialogues manually annotated for the presence of humour as well as sarcasm. A different approach is presented in [41], where the authors obtained humour labels by exploiting time-aligned user comments for videos on the Chinese video platform Bilibili. Hasan et al [17] compile their dataset UR-Funny from TED talk recordings, using laughter markup in the provided transcripts to automatically label punchline sentences in the recorded talks.…”

Section: Multimodal Humour Recognitionmentioning

confidence: 99%

“…For Open Mic, Mittal et al [42] collected standup comedy recordings and used the audience's laughter to create annotations indicating the degree of humour on a scale from zero to four. Similar to text-only datasets, most multimodal datasets are in English, notable exceptions being the already mentioned MUMOR-ZH [19], MaSaC [40], the Chinese dataset used in [41] and M2H2 [43], which is based on a Hindi TV show.…”

Section: Multimodal Humour Recognitionmentioning

confidence: 99%

“…An aspect not covered by Table 3 is the annotation level. Different from all existing humour databases except the one created by Yang et al [41], PASSAU-SFCH is labelled in a time-continuous manner. All other datasets listed in Table 3 are annotated at utterance level, i. e., an utterance is either a punchline/joke or not.…”

Section: Comparison With Other Humour Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Christ¹,

Amiriparian²,

Kathan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Humour is a substantial element of human affect and cognition. Its automatic understanding can facilitate a more naturalistic human-device interaction and the humanisation of artificial intelligence. Current methods of humour detection are solely based on staged data making them inadequate for 'real-world' applications. We address this deficiency by introducing the novel Passau-Spontaneous Football Coach Humour (Passau-SFCH) dataset, comprising of about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humour and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments, employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humour recognition is analysed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humour and its sentiment, facial expressions are most promising, while humour direction can be best modelled via text-based features. The results reveal considerable differences among various subjects, highlighting the individuality of humour usage and style. Further, we observe that a decision-level fusion yields the best recognition result. Finally, we make our code publicly available at https://www.github.com/EIHW/passau-sfch. The Passau-SFCH dataset is available upon request.

show abstract