Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-703
|View full text |Cite
|
Sign up to set email alerts
|

Emotion Recognition from Speech Using wav2vec 2.0 Embeddings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
63
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 202 publications
(83 citation statements)
references
References 0 publications
1
63
0
Order By: Relevance
“…Dissanayake et al [80] used the last two participants in the validation and test sets, respectively, reaching an accuracy of 56.71% on the speech modality. With a variation of this setup, Pepino et al [40] used as the test set only the last two participants and combined the 'Calm' and 'Neutral' emotions, passing from a problem with eight emotions to one with seven different classes. In these conditions, the top accuracy reached by their model was 77.5%, applying a global normalization.…”
Section: Comparative Results With Related Approachesmentioning
confidence: 99%
See 2 more Smart Citations
“…Dissanayake et al [80] used the last two participants in the validation and test sets, respectively, reaching an accuracy of 56.71% on the speech modality. With a variation of this setup, Pepino et al [40] used as the test set only the last two participants and combined the 'Calm' and 'Neutral' emotions, passing from a problem with eight emotions to one with seven different classes. In these conditions, the top accuracy reached by their model was 77.5%, applying a global normalization.…”
Section: Comparative Results With Related Approachesmentioning
confidence: 99%
“…Prosody, spectral, and voice quality-based features were used to train a hierarchical DNN classifier, achieving an accuracy of 81.2% on the RAVDESS dataset. Pepino et al [40] combined hand-crafted features and deep models using eGeMAPS features together with the embeddings extracted from Wav2Vec to train a CNN model. They achieved an accuracy of 77.5% applying a global normalization on this dataset.…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…Other works such as [76] used the last two participants in the validation and test sets, respectively, reaching an accuracy of 56.71% on the speech modality. With a variation of this set-up, we also found the work of Pepino et al [34], which used as the test set only the last two participants and combined the 'Calm' and 'Neutral' emotions, passing from a problem with eight emotions to a problem with seven different classes. On these conditions, the top accuracy reached by their model is 77.5%, applying a global normalization.…”
Section: Comparative Results With Previous Workmentioning
confidence: 99%
“…For example, Singh et al [33] suggested the use of prosody, spectral-information, and voice quality, to train a hierarchical DNN classifier, reaching an accuracy of 81.2% on RAVDESS. Pepino et al [34] combined eGeMAPS features with the embeddings extracted from an xlsr-Wav2Vec2.0 to train a CNN model. They achieved an accuracy of 77.5% by applying a global normalization on this dataset.…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%