Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10490
|View full text |Cite
|
Sign up to set email alerts
|

End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks

Abstract: Speech emotion conversion aims to convert the expressed emotion of a spoken utterance to a target emotion while preserving the lexical information and the speaker's identity. In this work, we specifically focus on in-the-wild emotion conversion where parallel data does not exist, and the problem of disentangling lexical, speaker, and emotion information arises. In this paper, we introduce a methodology that uses self-supervised networks to disentangle the lexical, speaker, and emotional content of the utteranc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(13 citation statements)
references
References 22 publications
0
13
0
Order By: Relevance
“…This extension is also the first in literature to present SER results in this novel dataset [31]. Existing analyses and experiments from [38], [39] were also extended to MSP-Conversation. Moreover, we performed additional experiments that include an experiment to understand the impact of the number of annotations available, and an ablation study.…”
Section: Introductionmentioning
confidence: 94%
See 4 more Smart Citations
“…This extension is also the first in literature to present SER results in this novel dataset [31]. Existing analyses and experiments from [38], [39] were also extended to MSP-Conversation. Moreover, we performed additional experiments that include an experiment to understand the impact of the number of annotations available, and an ablation study.…”
Section: Introductionmentioning
confidence: 94%
“…Research efforts have also made to estimate emotion annotations as a distribution, using LDL [26], [38], [39], [47]. Foteinopoulou et al [26] trained a MTL network using a KL divergence loss that models emotion annotations as a uni-variate Gaussian with mean m and unknown variance.…”
Section: Label Uncertainty In Sermentioning
confidence: 99%
See 3 more Smart Citations