2018
DOI: 10.29007/7mhj
|View full text |Cite
|
Sign up to set email alerts
|

Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network

Abstract: Audio Sentiment Analysis is a popular research area which extends the text-based sentiment analysis to depend on effectiveness of acoustic features extracted from speech. However, current progress on audio sentiment analysis mainly focuses on extracting homogeneous acoustic features or doesn't fuse heterogeneous features effectively. In this paper, we propose an utterance-based deep neural network model, which has a parallel combination of CNN and LSTM based network, to obtain representative features termed Au… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(20 citation statements)
references
References 15 publications
0
20
0
Order By: Relevance
“…Sentiment analysis of News Videos was conducted by Pereira et al [19] based on the audio, visual and textual features of these videos, using a myriad of ML techniques, achieving an accuracy of 75%. Luo et al [15] used a parallel combination of an LSTM and CNN based network to conduct audio-based sentiment detection on the MOSI dataset. Naïve Bayes algorithm was used on the Twitter dataset by Parveen et al [18] for sentiment analysis, which yielded an accuracy of 57%.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Sentiment analysis of News Videos was conducted by Pereira et al [19] based on the audio, visual and textual features of these videos, using a myriad of ML techniques, achieving an accuracy of 75%. Luo et al [15] used a parallel combination of an LSTM and CNN based network to conduct audio-based sentiment detection on the MOSI dataset. Naïve Bayes algorithm was used on the Twitter dataset by Parveen et al [18] for sentiment analysis, which yielded an accuracy of 57%.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The research field of emotion recognition using audio-based technologies is gaining increasing attention in recent years, leading to an increasing number of research studies in the field [42,43,44,45,46]. For instance, the work in [6] proposes a Bidirectional Long Short-Term Memory (Bi-LSTM) with attention model to classify four different emotions including "anger", "Excitement", "Neutral" and "Sadness" from the IEMOCAP dataset, which includes a range of dyadic sessions where actors perform improvisations or scripted scenarios specifically selected to elicit emotional expressions.…”
Section: Inference From Audiomentioning
confidence: 99%
“…This is because sentiment recognition depends on detecting a very focused vocabulary in the spoken comments (Kaushik, Sangwan, & Hansen, ). Furthermore, during the voice is transferred into the text, some sentiment‐related signal characteristics are also lost, resulting in a decrease in the classification accuracy of sentiments (Luo et al, ).…”
Section: Unimodal Sda Approaches In Educationmentioning
confidence: 99%
“…Based on a large number of experiments, a BiLSTM with the attention mechanism is employed for sentiment information of each utterance in audio. Specifically, spectrum graphs, spectral features and cepstrum coefficient produced by signals are fed into CNN and LSTM for feature fusion (Luo, Xu, & Chen, ). Compared with the acoustic signal, texture analysis of audio spectral image can effectively characterize sentiment states (Özseven, ).…”
Section: Unimodal Sda Approaches In Educationmentioning
confidence: 99%