2019
DOI: 10.48550/arxiv.1907.00112
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Emotion classification is also beneficial in the para-linguistic field as well. Recently, the commercial digital assistant application, such as Siri, found that para-linguistic information, such as emotion, is beneficial for recognizing the intent of the speaker [2]. Humans usually employ multi-modality information to identify emotions.…”
Section: Introductionmentioning
confidence: 99%
“…Emotion classification is also beneficial in the para-linguistic field as well. Recently, the commercial digital assistant application, such as Siri, found that para-linguistic information, such as emotion, is beneficial for recognizing the intent of the speaker [2]. Humans usually employ multi-modality information to identify emotions.…”
Section: Introductionmentioning
confidence: 99%
“…For uni-and multi-modal inputs, 2-layer TC-GRU models were trained, where the performance on a held-out Valid1.6 set (see Table 1) was used for model selection. Concordance correlation coefficient (CCC) [21] is used as the loss function (L ccc ) (see (1)), where L ccc is a combination (α = 1/3 and β = 1/3) of CCC's obtained from each of the three dimensional emotions. CCC is defined by (2), where µ x and µ y are the means, σ 2 x and σ 2 y are the corresponding variances for the estimated and ground-truth variables, and ρ is the correlation coefficient between those variables.…”
Section: Model Trainingmentioning
confidence: 99%
“…Current humanmachine interaction systems can recognize the words said by the speaker but fail to acknowledge the expressed emotion. Reliable and robust speech-based emotion models can help improve human-computer interaction and health/wellness applications, such as voice assistants [1,2], clinical mental health diagnoses [3,4], and/or therapy treatments [5].…”
Section: Introductionmentioning
confidence: 99%
“…In speech communication people make use of two types of information to convey their intentions: what is said (linguistic information) and how it is said (paralinguistic information) [1]. Paralinguistic information is expressed by suprasegmental features such as duration, intensity and pitch.…”
Section: Introductionmentioning
confidence: 99%
“…In the literature of a semantic framework called Alternative Semantics [3,4], focus is understood as the indication of 'the presence of alternatives that are relevant for the interpretation of linguistic expressions' [5]. Consider this example: (1) a. John bought the apple. b. John bought the apple.…”
Section: Introductionmentioning
confidence: 99%