2021
DOI: 10.1609/icwsm.v10i1.14810
|View full text |Cite
|
Sign up to set email alerts
|

Fusing Audio, Textual, and Visual Features for Sentiment Analysis of News Videos

Abstract: This paper presents a novel approach to perform sentiment analysis of news videos, based on the fusion of audio, textual and visual clues extracted from their contents. The proposed approach aims at contributing to the semiodiscoursive study regarding the construction of the ethos (identity) of this media universe, which has become a central part of the modern-day lives of millions of people. To achieve this goal, we apply state-of-the-art computational methods for (1) automatic emotion recognition from facial… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 14 publications
0
6
0
Order By: Relevance
“…[6] employed an audio-visual approach to sentiment analysis by using sophisticated models on the Spotify dataset and a vast collection of movie clips, wherein an AUC of 0.652 was obtained. Sentiment analysis of News Videos was conducted by Pereira et al [19] based on the audio, visual and textual features of these videos, using a myriad of ML techniques, achieving an accuracy of 75%. Luo et al [15] used a parallel combination of an LSTM and CNN based network to conduct audio-based sentiment detection on the MOSI dataset.…”
Section: Literature Reviewmentioning
confidence: 99%
“…[6] employed an audio-visual approach to sentiment analysis by using sophisticated models on the Spotify dataset and a vast collection of movie clips, wherein an AUC of 0.652 was obtained. Sentiment analysis of News Videos was conducted by Pereira et al [19] based on the audio, visual and textual features of these videos, using a myriad of ML techniques, achieving an accuracy of 75%. Luo et al [15] used a parallel combination of an LSTM and CNN based network to conduct audio-based sentiment detection on the MOSI dataset.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Videos and images are analyzed using different suitable model to provide better prediction. Multi modal based solution using suitable model according to the information type is a novel one [11] [16]. The entire application will be in single framework from the collection of data with the help of social media or chatbot analyzing the data using suitable model and the sending the relevant data to the rescue team [12].…”
Section: International Journal On Recent and Innovation Trends In Com...mentioning
confidence: 99%
“…The benefits arising from a combined analysis of different features extracted from different modalities were demonstrated in [73], where a combination of the modulations in speech, textual clues, and facial expressions extracted from videos improved the identification of the level of tension from newscasts.…”
Section: Video Data and Emotion Classificationmentioning
confidence: 99%