2022
DOI: 10.3390/mti6040028
|View full text |Cite
|
Sign up to set email alerts
|

Emotion Classification from Speech and Text in Videos Using a Multimodal Approach

Abstract: Emotion classification is a research area in which there has been very intensive literature production concerning natural language processing, multimedia data, semantic knowledge discovery, social network mining, and text and multimedia data mining. This paper addresses the issue of emotion classification and proposes a method for classifying the emotions expressed in multimodal data extracted from videos. The proposed method models multimodal data as a sequence of features extracted from facial expressions, s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(2 citation statements)
references
References 105 publications
0
2
0
Order By: Relevance
“…The most prominent emotion linked to eyebrow movement is that of surprise. This association has often been noted [45][46][47][48] and is, in a way, archetypical from an evolutionary point of view. Eyebrows are raised as a result of opening one's eyes wide, as a symptom of "attentional activity" [20,[48][49][50].…”
Section: Facial Gestures and Audiovisual Prosodymentioning
confidence: 82%
“…The most prominent emotion linked to eyebrow movement is that of surprise. This association has often been noted [45][46][47][48] and is, in a way, archetypical from an evolutionary point of view. Eyebrows are raised as a result of opening one's eyes wide, as a symptom of "attentional activity" [20,[48][49][50].…”
Section: Facial Gestures and Audiovisual Prosodymentioning
confidence: 82%
“…Rao et al (2021) have employed speech-based LSTM features, k-nearest neighbors (kNNs), Bayesian networks, hidden Markov models (HMMs), and artificial neural networks (ANN) based features from facial expressions for acoustic features (Gaussian mixture model, Mel frequency cepstral coefficients) using RAVDESS (Ryerson audio-visual database of emotional speech and song) audio dataset. Emotion classification from speech and text in videos using a multimodal approach has been performed in Caschera et al (2022) where an automatic extraction of emotional information from a variety of data provided by different interaction modalities and from different domains has been demonstrated. A survey on multimodal video sentiment analysis using deep learning approaches has been reported in Abdu et al (2021) where multimodal sentiment analysis systems with the Multimodal Multi-Utterance based architecture have been discussed.…”
Section: Multimodal Sentiment Analysismentioning
confidence: 99%