VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English

Sager, Jacob; Shankar, Ravi; Reinhold, Jacob C.; Venkataraman, Archana

doi:10.21437/interspeech.2019-1413

Cited by 16 publications

(16 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the Toronto Emotional Speech Set (TESS) [143], 2 speakers speak 200 words with 7 different emotions in the carrier phrase ("Say the word ..."). A recent VESUS database [144] is designed and released with over 250 distinct phrases, each read by ten actors in five emotional states. These databases mark valuable practice to understand the emotion variance in the word or phrase level, but may not be suitable to build a state-of-the-art emotional voice conversion framework that is usually data-driven.…”

Section: Lexical Variabilitymentioning

confidence: 99%

Emotional Voice Conversion: Theory, Databases and ESD

Zhou¹,

Şişman²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we first provide a review of the state-of-the-art emotional voice conversion research, and the existing emotional speech databases. We then motivate the development of a novel emotional speech database (ESD) that addresses the increasing research need. With this paper, the ESD database 1 is now made available to the research community. The ESD database consists of 350 parallel utterances spoken by 10 native English and 10 native Chinese speakers and covers 5 emotion categories (neutral, happy, angry, sad and surprise). More than 29 hours of speech data were recorded in a controlled acoustic environment. The database is suitable for multi-speaker and cross-lingual emotional voice conversion studies. As case studies, we implement several state-of-the-art emotional voice conversion systems on the ESD database. This paper provides a reference study on ESD in conjunction with its release.

show abstract

Section: Lexical Variabilitymentioning

confidence: 99%

Emotional Voice Conversion: Theory, Databases and ESD

Zhou¹,

Şişman²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The ability of the regressor to differentiate between emotions resp. place the emotions in the AV space was tested on ten publicly available databases: EmoDB [19], EMOVO [20], RAVDESS [21], CREMA-D [22], SAVEE [23], VESUS [24], eNTERFACE [25], JL Corpus [26], TESS [27], and GEES [28]. These databases are categorically annotated and do not include information on AV values.…”

Section: Testing Databasesmentioning

confidence: 99%

Mapping Discrete Emotions in the Dimensional Space: An Acoustic Approach

et al. 2021

View full text Add to dashboard Cite

A frequently used procedure to examine the relationship between categorical and dimensional descriptions of emotions is to ask subjects to place verbal expressions representing emotions in a continuous multidimensional emotional space. This work chooses a different approach. It aims at creating a system predicting the values of Activation and Valence (AV) directly from the sound of emotional speech utterances without the use of its semantic content or any other additional information. The system uses X-vectors to represent sound characteristics of the utterance and Support Vector Regressor for the estimation the AV values. The system is trained on a pool of three publicly available databases with dimensional annotation of emotions. The quality of regression is evaluated on the test sets of the same databases. Mapping of categorical emotions to the dimensional space is tested on another pool of eight categorically annotated databases. The aim of the work was to test whether in each unseen database the predicted values of Valence and Activation will place emotion-tagged utterances in the AV space in accordance with expectations based on Russell’s circumplex model of affective space. Due to the great variability of speech data, clusters of emotions create overlapping clouds. Their average location can be represented by centroids. A hypothesis on the position of these centroids is formulated and evaluated. The system’s ability to separate the emotions is evaluated by measuring the distance of the centroids. It can be concluded that the system works as expected and the positions of the clusters follow the hypothesized rules. Although the variance in individual measurements is still very high and the overlap of emotion clusters is large, it can be stated that the AV coordinates predicted by the system lead to an observable separation of the emotions in accordance with the hypothesis. Knowledge from training databases can therefore be used to predict AV coordinates of unseen data of various origins. This could be used to detect high levels of stress or depression. With the appearance of more dimensionally annotated training data, the systems predicting emotional dimensions from speech sound will become more robust and usable in practical applications in call-centers, avatars, robots, information-providing systems, security applications, and the like.

show abstract

“…The content of the corpus is semantically constant to allow the tone of the delivery to play a greater role in predicting the emotion of an instance. The same methodology is also used in [42] while designing a database for the English language. Other dataset for English language include MSP-IPROV [7], RAVDESS [31], SAVEE [23] and VESUS [42].…”

Section: Related Workmentioning

confidence: 99%

SEMOUR: A Scripted Emotional Speech Repository for Urdu

Zaheer,

Ahmad,

Ahmed

et al. 2021

Preprint

View full text Add to dashboard Cite

VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English

Cited by 16 publications

References 18 publications

Emotional Voice Conversion: Theory, Databases and ESD

Emotional Voice Conversion: Theory, Databases and ESD

Mapping Discrete Emotions in the Dimensional Space: An Acoustic Approach

SEMOUR: A Scripted Emotional Speech Repository for Urdu

Contact Info

Product

Resources

About