2022
DOI: 10.3390/s22197561
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech

Abstract: Vocal emotion recognition (VER) in natural speech, often referred to as speech emotion recognition (SER), remains challenging for both humans and computers. Applied fields including clinical diagnosis and intervention, social interaction research or Human Computer Interaction (HCI) increasingly benefit from efficient VER algorithms. Several feature sets were used with machine-learning (ML) algorithms for discrete emotion classification. However, there is no consensus for which low-level-descriptors and classif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 51 publications
0
4
0
Order By: Relevance
“…Candidate sets of acoustic parameters have been identified for affective speech (e.g., Eyben et al, 2010 , 2016 ; Schuller et al, 2009 ) and nonverbal vocalizations (Sauter et al, 2010 ). However, testing responses to changing acoustic parameters and their combinations poses challenges for both human participants and machine learning algorithms (e.g., Doğdu et al, 2022 ) and is likely to be a focus of future research. Another approach to creating neutral versions of stimuli is to synthesize them.…”
Section: Discussionmentioning
confidence: 99%
“…Candidate sets of acoustic parameters have been identified for affective speech (e.g., Eyben et al, 2010 , 2016 ; Schuller et al, 2009 ) and nonverbal vocalizations (Sauter et al, 2010 ). However, testing responses to changing acoustic parameters and their combinations poses challenges for both human participants and machine learning algorithms (e.g., Doğdu et al, 2022 ) and is likely to be a focus of future research. Another approach to creating neutral versions of stimuli is to synthesize them.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, it should be emphasized that this study focused Regarding methods, there is no consensus about which set of acoustic parameters are optimal for classifying emotions in speech. 33 GeMAPS 11 is developed to analyze affect and emotions in speech. It has been validated for analyses of sustained phonations as well as whole sentences.…”
Section: Discussionmentioning
confidence: 99%
“…For example, facial emotion expressions or speech were traditionally measured using electrodes or sensor coils (e.g., facial electromyography [172], real-time magnetic resonance imaging (MRI), or electromagnetic articulography [123], respectively). Although precise and wellestablished, such methods have the disadvantage of being bound to a lab than more recent contact-free approaches like video recordings for the analyses of facial expression, acoustic analysis for automatic speech recognition, and recognition of emotions in speech [40,170]. Video-based assessment of body pose is also increasingly used in clinical populations [111], however, developing efficient machine learning algorithms for semantic analysis of body pose and gestures remains a challenge.…”
Section: Interaction With Other Fieldsmentioning
confidence: 99%