2021
DOI: 10.1109/access.2021.3068045
|View full text |Cite
|
Sign up to set email alerts
|

A Comprehensive Review of Speech Emotion Recognition Systems

Abstract: During the last decade, Speech Emotion Recognition (SER) has emerged as an integral component within Human-computer Interaction (HCI) and other high-end speech processing systems. Generally, an SER system targets the speaker's existence of varied emotions by extracting and classifying the prominent features from a preprocessed speech signal. However, the way humans and machines recognize and correlate emotional aspects of speech signals are quite contrasting quantitatively and qualitatively, which present enor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
78
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 172 publications
(79 citation statements)
references
References 97 publications
1
78
0
Order By: Relevance
“…According to the reviews of Wani et al [28] and Berkeham and Oguz [27], we can distinguish two main ways to perform speech emotion recognition: by using traditional classifiers or deep-learning classifiers.…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%
“…According to the reviews of Wani et al [28] and Berkeham and Oguz [27], we can distinguish two main ways to perform speech emotion recognition: by using traditional classifiers or deep-learning classifiers.…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%
“…Speech is an incredibly powerful means of communication, as it not only codifies linguistic information, i.e., a message, but also provides paralinguistic cues about the emotional state of the speaker [1]. Emotion is, therefore, a key element for seamless humancomputer interaction (HCI), both in the input channel, by means of emotional speech recognition [2] and in its output, through expressive speech synthesis [3], among others. In this context, emotions have been traditionally represented: (i) using a dimensional space, such as the circumplex model, which is defined by arousal, valence, and dominance [4]; or, (ii) as discrete categories, such as those defined in [5] and denoted as the big six basic emotions, namely, anger, disgust, fear, happiness, sadness, and surprise.…”
Section: Introductionmentioning
confidence: 99%
“…In this context, emotions have been traditionally represented: (i) using a dimensional space, such as the circumplex model, which is defined by arousal, valence, and dominance [4]; or, (ii) as discrete categories, such as those defined in [5] and denoted as the big six basic emotions, namely, anger, disgust, fear, happiness, sadness, and surprise. According to these categories, expressive speech databases are built containing spontaneous speech, elicited speech, or acted speech [2].…”
Section: Introductionmentioning
confidence: 99%
“…Natural language processing (NLP) is a field of research that deals with machine learning (ML) algorithms applied to human natural languages [5]. NLP applications aim to automatically process written and spoken human languages including sentiment analysis [6,7], sarcasm detection [8], machine translation [9], speech recognition [10], automated dialogue systems [11], urban studies [12,13], topic classification [14], similarity detection [15], text summarization [16], intent detection [17], news and social media analysis [18,19], part-of-speech (POS) tagging [20], authorship attribution [21,22], fake tweet detection [23], coreference resolution [24] and others [14,[25][26][27]. Recently, NLP techniques have also been employed to study the sentiments and attitudes of social media users regarding the COVID-19 pandemic [28,29].…”
Section: Introductionmentioning
confidence: 99%