2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461299
|View full text |Cite
|
Sign up to set email alerts
|

Soft-Target Training with Ambiguous Emotional Utterances for DNN-Based Speech Emotion Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 41 publications
(27 citation statements)
references
References 6 publications
0
26
0
Order By: Relevance
“…Even though listener variability affects the majority decision as to emotion perception, there is little work that considers the listener in SER. One related work is soft-label / multi-label emotion recognition; it models the distribution of emotion perception of listeners [17,29,30], but it cannot distinguish individuals. In music emotion recognition, several studies have tackled listener-wise perception [31,32].…”
Section: I R E L a T E D W O R Kmentioning
confidence: 99%
See 1 more Smart Citation
“…Even though listener variability affects the majority decision as to emotion perception, there is little work that considers the listener in SER. One related work is soft-label / multi-label emotion recognition; it models the distribution of emotion perception of listeners [17,29,30], but it cannot distinguish individuals. In music emotion recognition, several studies have tackled listener-wise perception [31,32].…”
Section: I R E L a T E D W O R Kmentioning
confidence: 99%
“…The database consists of a total of 12 h of English utterances generated by improvised or scripted scenarios specifically atsushi ando et al written to represent the emotional expressions. As in several conventional studies [11,14,16,27,29], we used only audio tracks of the improvised set since scripted data may contain undesired contextual information. There are six listeners in the corpus and every utterance was annotated by three of them.…”
Section: A) Datasetsmentioning
confidence: 99%
“…There are several studies that leverage listener-dependent emotion perceptions for SER. One approach is to model the distribution of emotion perception of listeners by softlabel [14] or multivariate Gaussian [15]. However, it cannot make distinctions with regard to the listeners.…”
Section: Introductionmentioning
confidence: 99%
“…Specially, the human brain's attention to each area of an image is different, and it will thus focus on certain areas with obvious characteristics. Recently, the attention mechanism has been successfully applied for object detection [30,31] and classification [32], nature language processing (NLP) [33,34], machine translation [35,36], etc. In order to extract more discriminative deep speech features for improving recognition performance, some attention-based deep models have been developed for SER in recent years.…”
Section: Introductionmentioning
confidence: 99%