Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2508
|View full text |Cite
|
Sign up to set email alerts
|

Role of Regularization in the Prediction of Valence from Speech

Abstract: Regularization plays a key role in improving the prediction of emotions using attributes such as arousal, valence and dominance. Regularization is particularly important with deep neural networks (DNNs), which have millions of parameters. While previous studies have reported competitive performance for arousal and dominance, the prediction results for valence using acoustic features are significantly lower. We hypothesize that higher regularization can lead to better results for valence. This study focuses on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 21 publications
(17 citation statements)
references
References 25 publications
3
14
0
Order By: Relevance
“…The corresponding relative improvements observed for arousal and dominance were less than 4%. These results showed that valence emotional cues include more speakerdependent traits, explaining why heavily regularizing a DNN helps to learn more general emotional cues across speakers [15]. Building on these results, we propose an unsupervised personalization approach that is extremely useful in the prediction of valence.…”
Section: Introductionmentioning
confidence: 70%
“…The corresponding relative improvements observed for arousal and dominance were less than 4%. These results showed that valence emotional cues include more speakerdependent traits, explaining why heavily regularizing a DNN helps to learn more general emotional cues across speakers [15]. Building on these results, we propose an unsupervised personalization approach that is extremely useful in the prediction of valence.…”
Section: Introductionmentioning
confidence: 70%
“…Although arousal and dominance have similar accuracy with dynamicOverlap (i.e., the differences are not statistically significant), our proposed Self-AttenVec method achieved the best valence CCC result (CCC=0.3337). Valence is an attribute that is particularly challenging to predict with acoustic features [52], [53], indicating that complete sentencelevel information can bring complemental benefits for more complex tasks. The advantage of applying attention models is amplified in the CNN and functional models.…”
Section: Proposed Chunk-level Ser Resultsmentioning
confidence: 99%
“…Adam [27] optimizer and exponential decay learning rate with initial rate 1e-3, decay rate 0.93 for every epoch, and final rate 5e-5 are used to optimize parameters. For the regularization, dropout with rate 0.7 as suggested in [28] is used for the output of encoder; l1 and l2 regularization with the weight 5e-3 are used for training RECOLA and IEMOCAP respectively. We train the models for 50 epochs with a batch size of 32, and 30% of data from test set is used as the development set for early stopping.…”
Section: Methodsmentioning
confidence: 99%