2021
DOI: 10.1016/j.specom.2020.12.009
|View full text |Cite
|
Sign up to set email alerts
|

Learning deep multimodal affective features for spontaneous speech emotion recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 66 publications
(17 citation statements)
references
References 23 publications
0
17
0
Order By: Relevance
“…In contrast to classic machine learning, deep learning mostly deals with either raw speech signals or timefrequency representations, and often shows better performance than classic machine learning in recent advances [18], [19]. Compared to 1D raw speech signals, 2D time-frequency representations have become more popular [20]. Typical time-frequency representations include spectrograms, Mel spectrograms, log Mel spectrograms, and Mel Frequency Cepstral Coefficients (MFCCs) [21], [22], [23].…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%
“…In contrast to classic machine learning, deep learning mostly deals with either raw speech signals or timefrequency representations, and often shows better performance than classic machine learning in recent advances [18], [19]. Compared to 1D raw speech signals, 2D time-frequency representations have become more popular [20]. Typical time-frequency representations include spectrograms, Mel spectrograms, log Mel spectrograms, and Mel Frequency Cepstral Coefficients (MFCCs) [21], [22], [23].…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%
“…The accuracy of the proposed system was significantly better in comparison to the individual classifier. Zhang et al [ 28 ] proposed multi- CNN to learn multimodal audio features from spontaneous speech for emotion recognition. Through a fusion of different features, complementary information could be generated which significantly improved the accuracy of emotion recognition.…”
Section: Related Workmentioning
confidence: 99%
“…In recent years, with the rapid development of neural network, its advantages in the field of time series modeling and generation have been widely concerned and applied. In the field of natural language processing, neural networks have made breakthroughs in language modeling, speech recognition, and machine translation [ 3 ]. For the field of computer vision, it has excellent performance in object recognition, visual tracking, image generation, video analysis, etc.…”
Section: Introductionmentioning
confidence: 99%